bioinfomaticsCSU / deepsignal

Detecting methylation using signal-level features from Nanopore sequencing reads
GNU General Public License v3.0
111 stars 21 forks source link

RNA methylation calling #67

Open SilvanHaelg opened 3 years ago

SilvanHaelg commented 3 years ago

Dear Peng,

I wonder if it is possible to use Deepsignal to detect methylations in RNA from samples sequenced by the direct RNA sequencing kit from ONT. I think about training a model by using complete methylated RNA and non methylated cDNA as training data. Do you think that the different signals between RNA and cDNA would interfere with the methylation signal and therefore lead to a not very accurate model?

Thanks for your time Silvan

PengNi commented 3 years ago

Hi @SilvanHaelg , thanks for your interest of deepsignal. We are working for RNA modification detection using deepsignal too. But currently we haven't trained a statisfying model. What you said may work, however I cannot guarantee the performance now.

For RNA modification detection, you can also check other existing tools, such as nanom6A, EpiNano, or nanoDoc.

Best, Peng

SilvanHaelg commented 3 years ago

Dear Peng Thank you for the quick reply. I will try to train a model with my approach and test your program suggestions as well. How long do you think would it go until you have implemented the RNA modification detection? Best, Silvan

PengNi commented 3 years ago

We have no concrete timeline now. It may be months.

Best, Peng

pterzian commented 2 years ago

Hi @PengNi ! I am trying a similar approach than OP but more model training oriented so I only need to extract features. Yet I could not succeed to run the resquiggle command on our RNA dataset.

I have this message :

[14:53:41] Loading minimap2 reference.
[14:53:41] Getting file list.
******************** ERROR ********************
    Reads do not to contain basecalls. Check --basecall-group option if basecalls are stored in non-standard location or use `tombo annotate_raw_with_fastqs` to add basecalls from FASTQ files to raw F
AST5 files.

So I tried to look up into the reads and I can see the basecalls in the dedicated Basecall_1D_000 field. So when trying to tombo preprocess annotate_raw_with_fastqs, it just tells me it added the sequence of 0 reads.

I thought you may had an idea of where this issue could come from. Unfortunately tombo's github looks quite dead at the moment.

Best,

Paul

PengNi commented 2 years ago

Hi @pterzian , did you use tombo annotate with the option (maybe --summary I think), to add the summary file of the reads in fastqs. If you didn't, please try.

PengNi commented 2 years ago

@pterzian , I checked, it is --sequencing-summary-filenames. Previously I don't need this parameter specified, but recently it seems this has to be set for annotate_raw_with_fastqs.

pterzian commented 2 years ago

Thanks for the help @PengNi, I tried annotating fast5 with this option, but this is the output :

[11:16:47] Getting read filenames.
[11:16:47] Parsing sequencing summary files.
******************** WARNING ********************
    Some FASTQ records from sequencing summaries do not appear to have a matching file.
[11:17:08] Annotating FAST5s with sequence from FASTQs.
****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files.
0it [09:14, ?it/s]
[11:26:22] Added sequences to a total of 0 reads

I know some fastq records can't be found in fast5 because I extracted only single fast5 reads mapping a specific contig and I am using the full concatenate fastqs in the tombo annotate command. I checked for some fast5 read IDs for if I could find them into the sequencing summary and fastqs and I do, so I am not sure where the issue come from.

Looks like I am going to dig more and do more testing!

Best,

Paul

PengNi commented 2 years ago

For now I can think about another two possible reasons: (1) multi-reads/single-read format. (2) VBZ compression issue.

Hope you find the reason soon!

Best, Peng