Open gottaMe opened 3 years ago
@gottaMe Thanks for being interested in DeepMod. Could you please show what you have via h5ls -r YOUR-fast5 | head -n 50
?
Meanwhile, data/meth10_lib3/ecoli_er2925.pcr_MSssI.timp.021216.fast5_small/fail/kelvin_021116_methecoli_4101_1_ch67_file7_strand.fast5
is from a fail
folder, which might not contain useful fast5. If you have pass
folder together with fail
folder, please use fast5 from pass
folder.
Thanks for your reply!
I tried to use the files from pass
fold to test the DeepMod, but it still report the error:
Error!!! No Raw_reads/Signal data /Raw/Reads in data/Control_lib1/ecoli_er2925.pcr.timp.021216.fast5_small/pass/imperial_021116_unmethecoli_3923_1_ch67_file60_strand.fast5
The results of h5ls -r imperial_021116_unmethecoli_3923_1_ch67_file60_strand.fast5 | head -n 50
as follow (this fast5 file is one of the file in the pass
fold):
/ Group /Analyses Group /Analyses/Basecall_1D_000 Group /Analyses/Basecall_1D_000/BaseCalled_complement Group /Analyses/Basecall_1D_000/BaseCalled_complement/Events Dataset {3499} /Analyses/Basecall_1D_000/BaseCalled_complement/Fastq Dataset {SCALAR} /Analyses/Basecall_1D_000/BaseCalled_complement/Model Dataset {4096} /Analyses/Basecall_1D_000/BaseCalled_template Group /Analyses/Basecall_1D_000/BaseCalled_template/Events Dataset {3819} /Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR} /Analyses/Basecall_1D_000/BaseCalled_template/Model Dataset {4096} /Analyses/Basecall_1D_000/Configuration Group /Analyses/Basecall_1D_000/Configuration/aggregator Group /Analyses/Basecall_1D_000/Configuration/basecall_1d Group /Analyses/Basecall_1D_000/Configuration/basecall_2d Group /Analyses/Basecall_1D_000/Configuration/calibration_strand Group /Analyses/Basecall_1D_000/Configuration/components Group /Analyses/Basecall_1D_000/Configuration/general Group /Analyses/Basecall_1D_000/Configuration/hairpin_align Group /Analyses/Basecall_1D_000/Configuration/post_processing Group /Analyses/Basecall_1D_000/Configuration/post_processing.3000Hz Group /Analyses/Basecall_1D_000/Configuration/split_hairpin Group /Analyses/Basecall_1D_000/Log Dataset {SCALAR} /Analyses/Basecall_1D_000/Summary Group /Analyses/Basecall_1D_000/Summary/basecall_1d_complement Group /Analyses/Basecall_1D_000/Summary/basecall_1d_template Group /Analyses/Basecall_2D_000 Group /Analyses/Basecall_2D_000/BaseCalled_2D Group /Analyses/Basecall_2D_000/BaseCalled_2D/Alignment Dataset {4468} /Analyses/Basecall_2D_000/BaseCalled_2D/Fastq Dataset {SCALAR} /Analyses/Basecall_2D_000/Configuration Group /Analyses/Basecall_2D_000/Configuration/aggregator Group /Analyses/Basecall_2D_000/Configuration/basecall_1d Group /Analyses/Basecall_2D_000/Configuration/basecall_2d Group /Analyses/Basecall_2D_000/Configuration/calibration_strand Group /Analyses/Basecall_2D_000/Configuration/components Group /Analyses/Basecall_2D_000/Configuration/general Group /Analyses/Basecall_2D_000/Configuration/hairpin_align Group /Analyses/Basecall_2D_000/Configuration/post_processing Group /Analyses/Basecall_2D_000/Configuration/post_processing.3000Hz Group /Analyses/Basecall_2D_000/Configuration/split_hairpin Group /Analyses/Basecall_2D_000/HairpinAlign Group /Analyses/Basecall_2D_000/HairpinAlign/Alignment Dataset {3217} /Analyses/Basecall_2D_000/Log Dataset {SCALAR} /Analyses/Basecall_2D_000/Summary Group /Analyses/Basecall_2D_000/Summary/basecall_2d Group /Analyses/Basecall_2D_000/Summary/hairpin_align Group /Analyses/Basecall_2D_000/Summary/post_process_complement Group /Analyses/Basecall_2D_000/Summary/post_process_template Group /Analyses/Calibration_Strand_000 Group
@gottaMe It seems that the fast5 files have a lot of basecalling info, and I am wondering whether you can post the all output h5ls -r imperial_021116_unmethecoli_3923_1_ch67_file60_strand.fast5
. Thanks.
Thanks for your reply!
The all outputs of the command h5ls -r imperial_021116_unmethecoli_3923_1_ch67_file60_strand.fast5
are as follow:
/ Group /Analyses Group /Analyses/Basecall_1D_000 Group /Analyses/Basecall_1D_000/BaseCalled_complement Group /Analyses/Basecall_1D_000/BaseCalled_complement/Events Dataset {3499} /Analyses/Basecall_1D_000/BaseCalled_complement/Fastq Dataset {SCALAR} /Analyses/Basecall_1D_000/BaseCalled_complement/Model Dataset {4096} /Analyses/Basecall_1D_000/BaseCalled_template Group /Analyses/Basecall_1D_000/BaseCalled_template/Events Dataset {3819} /Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR} /Analyses/Basecall_1D_000/BaseCalled_template/Model Dataset {4096} /Analyses/Basecall_1D_000/Configuration Group /Analyses/Basecall_1D_000/Configuration/aggregator Group /Analyses/Basecall_1D_000/Configuration/basecall_1d Group /Analyses/Basecall_1D_000/Configuration/basecall_2d Group /Analyses/Basecall_1D_000/Configuration/calibration_strand Group /Analyses/Basecall_1D_000/Configuration/components Group /Analyses/Basecall_1D_000/Configuration/general Group /Analyses/Basecall_1D_000/Configuration/hairpin_align Group /Analyses/Basecall_1D_000/Configuration/post_processing Group /Analyses/Basecall_1D_000/Configuration/post_processing.3000Hz Group /Analyses/Basecall_1D_000/Configuration/split_hairpin Group /Analyses/Basecall_1D_000/Log Dataset {SCALAR} /Analyses/Basecall_1D_000/Summary Group /Analyses/Basecall_1D_000/Summary/basecall_1d_complement Group /Analyses/Basecall_1D_000/Summary/basecall_1d_template Group /Analyses/Basecall_2D_000 Group /Analyses/Basecall_2D_000/BaseCalled_2D Group /Analyses/Basecall_2D_000/BaseCalled_2D/Alignment Dataset {4468} /Analyses/Basecall_2D_000/BaseCalled_2D/Fastq Dataset {SCALAR} /Analyses/Basecall_2D_000/Configuration Group /Analyses/Basecall_2D_000/Configuration/aggregator Group /Analyses/Basecall_2D_000/Configuration/basecall_1d Group /Analyses/Basecall_2D_000/Configuration/basecall_2d Group /Analyses/Basecall_2D_000/Configuration/calibration_strand Group /Analyses/Basecall_2D_000/Configuration/components Group /Analyses/Basecall_2D_000/Configuration/general Group /Analyses/Basecall_2D_000/Configuration/hairpin_align Group /Analyses/Basecall_2D_000/Configuration/post_processing Group /Analyses/Basecall_2D_000/Configuration/post_processing.3000Hz Group /Analyses/Basecall_2D_000/Configuration/split_hairpin Group /Analyses/Basecall_2D_000/HairpinAlign Group /Analyses/Basecall_2D_000/HairpinAlign/Alignment Dataset {3217} /Analyses/Basecall_2D_000/Log Dataset {SCALAR} /Analyses/Basecall_2D_000/Summary Group /Analyses/Basecall_2D_000/Summary/basecall_2d Group /Analyses/Basecall_2D_000/Summary/hairpin_align Group /Analyses/Basecall_2D_000/Summary/post_process_complement Group /Analyses/Basecall_2D_000/Summary/post_process_template Group /Analyses/Calibration_Strand_000 Group /Analyses/Calibration_Strand_000/Configuration Group /Analyses/Calibration_Strand_000/Configuration/aggregator Group /Analyses/Calibration_Strand_000/Configuration/basecall_1d Group /Analyses/Calibration_Strand_000/Configuration/basecall_2d Group /Analyses/Calibration_Strand_000/Configuration/calibration_strand Group /Analyses/Calibration_Strand_000/Configuration/components Group /Analyses/Calibration_Strand_000/Configuration/general Group /Analyses/Calibration_Strand_000/Configuration/hairpin_align Group /Analyses/Calibration_Strand_000/Configuration/post_processing Group /Analyses/Calibration_Strand_000/Configuration/post_processing.3000Hz Group /Analyses/Calibration_Strand_000/Configuration/split_hairpin Group /Analyses/Calibration_Strand_000/Log Dataset {SCALAR} /Analyses/Calibration_Strand_000/Summary Group /Analyses/EventDetection_000 Group /Analyses/EventDetection_000/Configuration Group /Analyses/EventDetection_000/Configuration/abasic_detection Group /Analyses/EventDetection_000/Configuration/event_detection Group /Analyses/EventDetection_000/Configuration/hairpin_detection Group /Analyses/EventDetection_000/Reads Group /Analyses/EventDetection_000/Reads/Read_58 Group /Analyses/EventDetection_000/Reads/Read_58/Events Dataset {7371} /Analyses/Hairpin_Split_000 Group /Analyses/Hairpin_Split_000/Configuration Group /Analyses/Hairpin_Split_000/Configuration/aggregator Group /Analyses/Hairpin_Split_000/Configuration/basecall_1d Group /Analyses/Hairpin_Split_000/Configuration/basecall_2d Group /Analyses/Hairpin_Split_000/Configuration/calibration_strand Group /Analyses/Hairpin_Split_000/Configuration/components Group /Analyses/Hairpin_Split_000/Configuration/general Group /Analyses/Hairpin_Split_000/Configuration/hairpin_align Group /Analyses/Hairpin_Split_000/Configuration/post_processing Group /Analyses/Hairpin_Split_000/Configuration/post_processing.3000Hz Group /Analyses/Hairpin_Split_000/Configuration/split_hairpin Group /Analyses/Hairpin_Split_000/Log Dataset {SCALAR} /Analyses/Hairpin_Split_000/Summary Group /Analyses/Hairpin_Split_000/Summary/split_hairpin Group /Sequences Group /Sequences/Meta Group /UniqueGlobalKey Group /UniqueGlobalKey/channel_id Group /UniqueGlobalKey/context_tags Group /UniqueGlobalKey/tracking_id Group
Hi @gottaMe From the output of h5ls, it seems that there is no group of "Raw_reads/Signals" for signals. Although I suspect "/Sequences/Meta" is for signals, I am not sure about this before I read the fast5. I have been trying to download the data (not successful due to a potential firewall issue and I will fix is later), but it would be great if you can share a single fast5 for me to check.
Thanks for your reply!
Here is the test fast5 file, which is the file used in the command h5ls -r imperial_021116_unmethecoli_3923_1_ch67_file60_strand.fast5
and the corresponding file directory is \ecoli_er2925.pcr.timp.021216.fast5\pass\imperial_021116_unmethecoli_3923_1_ch67_file60_strand.fast5
@gottaMe Thanks for sharing this file. I downloaded it and checked it carefully: unfortunately, I do NOT find raw signals info in the file. I have no clue why, since usually there is raw signal data in fast5 generated by Nanopore sequencer.
OK,thank you for your help!
Hi, Liu I'm very interested in the DeepMod, and want to use it to call the 5-mc methylation in the datasets provided by Simpon et.al, and the datasets is downloaded from the https://www.ebi.ac.uk/ena/browser/view/PRJEB13021.
Since I don't have my own GPU, so I try to test these data in a GPU server. I download the dataset named 'ecoli_er2925.pcr.timp.021216.tar.gz' (75.3 GB) and 'ecoli_er2925.pcr_MSssI.timp.021216.tar.gz' (59.5 GB). Unfortunately, these dataset can't be uploaded to GPU server because of the large size, so I extract a part of data corresponding to ch67 and zip them to upload to the GPU server.
When I try to use the model 'mod_train_sinmodC_P100wd21_f3ne1u0' to call the 5-mc methylation on the data about ch67 in 'ecoli_er2925.pcr_MSssl.timp.021216.tar.gz', I get the following error information:
Error!!! No Raw_reads/Signal data /Raw/Reads in data/meth10_lib3/ecoli_er2925.pcr_MSssI.timp.021216.fast5_small/fail/kelvin_021116_methecoli_4101_1_ch67_file7_strand.fast5
Then I use the following command to examine this file:
Indeed, this fast5 file doesn't have Raw_reads/Signals information. Then I check other fast5 files about ch67 in 'ecoli_er2925.pcr_MSssl.timp.021216.tar.gz', and other fast5 file also doesn't have raw_read/Signal information.
So, I'm wondering how do you train or test the DeepMod if you use the dataset provided by Simpon et.al. If the datasets provided by Simpon et.al is partially usable, could you tell me which part of the data you use to train and test the model. Additional, Is there any other public datasets I can use to run DeepMod?
I'm just new to using nanopores to detect methylation, so maybe some strange questions were asked, but I still hope and appreciate you can help me to deal with these problems.
Yours Chen.