WGLab / DeepMod

DeepMod: a deep-learning tool for genomic-scale, strand-sensitive and single-nucleotide based detection of DNA modifications
Other
97 stars 35 forks source link

Basecalled fast5 files #24

Closed vahidAK closed 4 years ago

vahidAK commented 4 years ago

Hi,
Does DeepMod just accept basecalled fast5 files from albacore?
I based called my fast5 files using Guppy and used --fast5_out option that gives me basecalled fast5 file.
But when I used DeepMod with the following commands it gives me an error?
"python ../anaconda3/DeepMod/bin/DeepMod.py detect --wrkBase ../fast5_files/ --Ref ../reference.fasta --outFolder ../ --Base C --modfile ../anaconda3/DeepMod/train_mod/rnn_f7_wd21_chr1to10_4/mod_train_f7_wd21_chr1to10 --FileID User_Uniq_name --threads 16"
the error is:
Cannot open fast5 or other errors: ../DeepSignal/fast5_files/PLSP61583_20161015_FNFAB42316_MN17048_sequencing_run_Hum94_ss_jt_86199_ch172_read45_strand1.fast5

So it cannot open fast5 files.
What is the problem? what should I do?
Thanks,
Vahid.

liuqianhn commented 4 years ago

Hi @vahidAK , Right now, DeepMod can only read event information from fast5 files where each fast5 file only contains a single long read. We have been working on the improvement of DeepMod to make it read fast5 with move data generated by latest guppy and multif-fast5 file format where a fast5 contains multiple long reads. I cannot say what is the error you had until you show me the output of h5ls -r ../DeepSignal/fast5_files/PLSP61583_20161015_FNFAB42316_MN17048_sequencing_run_Hum94_ss_jt_86199_ch172_read45_strand1.fast5 | head -n 50.

vahidAK commented 4 years ago

My fast5 files are single reads.
Here is the output from h5ls:
/ Group /Analyses Group /Analyses/Basecall_1D_000 Group /Analyses/Basecall_1D_000/BaseCalled_template Group /Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR} /Analyses/Basecall_1D_000/BaseCalled_template/ModBaseProbs Dataset {28368, 6} /Analyses/Basecall_1D_000/BaseCalled_template/Move Dataset {126752} /Analyses/Basecall_1D_000/BaseCalled_template/Trace Dataset {126752, 8} /Analyses/Basecall_1D_000/Summary Group /Analyses/Basecall_1D_000/Summary/basecall_1d_template Group /Analyses/Segment_Linear_000 Group /Analyses/Segment_Linear_000/Summary Group /Analyses/Segment_Linear_000/Summary/split_hairpin Group /Analyses/Segmentation_000 Group /Analyses/Segmentation_000/Summary Group /Analyses/Segmentation_000/Summary/segmentation Group /Raw Group /Raw/Reads Group /Raw/Reads/Read_15 Group /Raw/Reads/Read_15/Signal Dataset {253507/Inf} /UniqueGlobalKey Group /UniqueGlobalKey/channel_id Group /UniqueGlobalKey/context_tags Group /UniqueGlobalKey/tracking_id Group

liuqianhn commented 4 years ago

Hi @vahidAK , your fast5 file contains move table rather than event table, which results in the error. We are working on the improvement to make DeepMod support move table.

vahidAK commented 4 years ago

Great! thank you so much. Meanwhile, I must use basecalled by albacore? Thank you so much.

liuqianhn commented 4 years ago

Hi @vahidAK , with this version of DeepMod, I am afraid that the answer is yes.

vahidAK commented 4 years ago

Thanks, @liuqianhn.
I am looking forward for the new version.

liuqianhn commented 4 years ago

@vahidAK , DeepMod is able to support move table now with the help of @yaoluxun. Please note that multi-fast5 format is not support yet, but it seems that your fast5 file is not for multiple reads but for single read. The option to use move table is "--move"

lixuwen1997 commented 4 years ago

@liuqianhn Hi there,

I got a very similar error.

Cannot open fast5 or other errors: single_fast5/0/ff680310-f1b1-4677-97b2-e17a9e060a19.fast5
...
Cur Prediction consuming time 0 for 0 0
Error information for different fast5 files:
        Cannot open fast5 or other errors 431
...

But many (311 out of 431) of my reads are event dataset. h5ls -r output is like this:

/                        Group
/Analyses                Group
/Analyses/Basecall_1D_000 Group
/Analyses/Basecall_1D_000/BaseCalled_template Group
/Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR}
/Analyses/RawGenomeCorrected_000 Group
/Analyses/RawGenomeCorrected_000/BaseCalled_template Group
/Analyses/RawGenomeCorrected_000/BaseCalled_template/Alignment Group
/Analyses/RawGenomeCorrected_000/BaseCalled_template/Events Dataset {10468}
/Analyses/Segmentation_000 Group
/Analyses/Segmentation_000/Summary Group
/Analyses/Segmentation_000/Summary/segmentation Group
/Raw                     Group
/Raw/Reads               Group
/Raw/Reads/Read_9299     Group
/Raw/Reads/Read_9299/Signal Dataset {186066/Inf}
/UniqueGlobalKey         Group
/UniqueGlobalKey/channel_id Group
/UniqueGlobalKey/context_tags Group
/UniqueGlobalKey/tracking_id Group

Could you help me with this? Thanks a lot.

yaoluxun commented 4 years ago

@lixuwen1997 Hi, I guess it is because your events table is under /Analyses/RawGenomeCorrected_000 while the default is /Analyses/Basecall_1D_000. You may change the "basecall_1d" argument. It will be more helpful if you could provide your command. Thank you.

lixuwen1997 commented 4 years ago

@yaoluxun Hi, Thank you for your reply. The command I used is: python ~/.soft/DeepMod/bin/DeepMod.py detect --wrkBase single_fast5/ --Ref reference.fasta --outFolder DeepMod/6mA --Base A --modfile ~/.soft/DeepMod/train_mod/rnn_conmodA_P100wd21_f7ne1u0_4/mod_train_conmodA_P100wd21_f3ne1u0 --FileID 6mA You are correct. But after I changed the basecall_1d argument, I got Error!!! No Fastq data I think the problem is that the RawGenomeCorrected_000 was generated by tombo resquiggle. If I don't want to use the RawGenomeCorrected_000, what should I do to run the DeepMod? Thanks

yaoluxun commented 4 years ago

Hi @lixuwen1997 Unfortunately, Deepmod requires the events table and the fastq under the same group. You may need to re-basecall the data using Albacore or Guppy. Looks like your data does not contain move table before the tombo resquiggling. Probably you used MinKNOW Live Basecalling which would not add events table to the fast5 file in order to reduce the file size.

Sorry for the inconvenience.

lixuwen1997 commented 4 years ago

Hi @yaoluxun Thanks a lot for your explanation. I re-basecalled the data with guppy and it works fine now.

liuqianhn commented 4 years ago

Closed due to no recent activities.