Closed qiuyixmm closed 4 years ago
Hi @qiuyixmm , could you please show what is the output of h5ls -r ~/deepmod_test/fast5_files/nanopore2_20161128_FNFAB49712_MN17633_sequencing_run_20161128_Human_Qiagen_1D_R9_4_64849_ch388_read4650_strand.fast5
?
this is content for test data: h5ls -r nanopore2_20161128_FNFAB49712_MN17633_sequencing_run_20161128_Human_Qiagen_1D_R9_4_64849_ch388_read4650_strand.fast5 / Group /Analyses Group /Analyses/Segment_Linear_000 Group /Analyses/Segment_Linear_000/Summary Group /Analyses/Segment_Linear_000/Summary/split_hairpin Group /Raw Group /Raw/Reads Group /Raw/Reads/Read_4650 Group /Raw/Reads/Read_4650/Signal Dataset {157207/Inf} /UniqueGlobalKey Group /UniqueGlobalKey/channel_id Group /UniqueGlobalKey/context_tags Group /UniqueGlobalKey/tracking_id Group
this is content for my own data: h5ls -r GXB01143_20180313_FAH59244_GA10000_sequencing_run_20180313_NPL0039_E1_81227_read_18378_ch_446_strand.fast5 / Group /Raw Group /Raw/Reads Group /Raw/Reads/Read_18378 Group /Raw/Reads/Read_18378/Signal Dataset {136801/Inf} /UniqueGlobalKey Group /UniqueGlobalKey/channel_id Group /UniqueGlobalKey/context_tags Group /UniqueGlobalKey/tracking_id Group
Additionally, i also downlaoded a subset of Na12878 Nanopore sequencing data(http://s3.amazonaws.com/nanopore-human-wgs/rel3-fast5-chr20.part05.tar) used in Example 3: Detect 5mC on Na12878. The running of DeepMod is successful for that i get the bed format results.
this is the content of one FAST5 (Signal Level files): h5ls -r PLSP61583_20161129_FNFAB49914_MN17048_sequencing_run_Hu_Nott_Bi_FC4_tune_92763_ch488_read194_strand.fast5 / Group /Analyses Group /Analyses/Basecall_1D_000 Group /Analyses/Basecall_1D_000/BaseCalled_template Group /Analyses/Basecall_1D_000/BaseCalled_template/Events Dataset {28908} /Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR} /Analyses/Basecall_1D_000/Configuration Group /Analyses/Basecall_1D_000/Configuration/aggregator Group /Analyses/Basecall_1D_000/Configuration/basecall_1d Group /Analyses/Basecall_1D_000/Configuration/calibration_strand Group /Analyses/Basecall_1D_000/Configuration/components Group /Analyses/Basecall_1D_000/Configuration/event_detection Group /Analyses/Basecall_1D_000/Configuration/general Group /Analyses/Basecall_1D_000/Configuration/genome_mapping Group /Analyses/Basecall_1D_000/Configuration/split_hairpin Group /Analyses/Basecall_1D_000/Log Dataset {SCALAR} /Analyses/Basecall_1D_000/Summary Group /Analyses/Basecall_1D_000/Summary/basecall_1d_template Group /Analyses/Calibration_Strand_000 Group /Analyses/Calibration_Strand_000/Configuration Group /Analyses/Calibration_Strand_000/Configuration/aggregator Group /Analyses/Calibration_Strand_000/Configuration/basecall_1d Group /Analyses/Calibration_Strand_000/Configuration/basecall_2d Group /Analyses/Calibration_Strand_000/Configuration/calibration_strand Group /Analyses/Calibration_Strand_000/Configuration/components Group /Analyses/Calibration_Strand_000/Configuration/general Group /Analyses/Calibration_Strand_000/Configuration/genome_mapping Group /Analyses/Calibration_Strand_000/Configuration/hairpin_align Group /Analyses/Calibration_Strand_000/Configuration/post_processing.3000Hz Group /Analyses/Calibration_Strand_000/Configuration/split_hairpin Group /Analyses/Calibration_Strand_000/Log Dataset {SCALAR} /Analyses/Calibration_Strand_000/Summary Group /Analyses/EventDetection_000 Group /Analyses/EventDetection_000/Configuration Group /Analyses/EventDetection_000/Configuration/aggregator Group /Analyses/EventDetection_000/Configuration/basecall_1d Group /Analyses/EventDetection_000/Configuration/calibration_strand Group /Analyses/EventDetection_000/Configuration/components Group /Analyses/EventDetection_000/Configuration/event_detection Group /Analyses/EventDetection_000/Configuration/general Group /Analyses/EventDetection_000/Configuration/split_hairpin Group /Analyses/EventDetection_000/Log Dataset {SCALAR} /Analyses/EventDetection_000/Reads Group /Analyses/EventDetection_000/Reads/Read_194 Group /Analyses/EventDetection_000/Reads/Read_194/Events Dataset {29484} /Analyses/EventDetection_000/Summary Group /Analyses/EventDetection_000/Summary/event_detection Group /Analyses/Segment_Linear_000 Group /Analyses/Segment_Linear_000/Configuration Group /Analyses/Segment_Linear_000/Configuration/aggregator Group /Analyses/Segment_Linear_000/Configuration/basecall_1d Group /Analyses/Segment_Linear_000/Configuration/calibration_strand Group /Analyses/Segment_Linear_000/Configuration/components Group /Analyses/Segment_Linear_000/Configuration/general Group /Analyses/Segment_Linear_000/Configuration/split_hairpin Group /Analyses/Segment_Linear_000/Log Dataset {SCALAR} /Analyses/Segment_Linear_000/Summary Group /Analyses/Segment_Linear_000/Summary/split_hairpin Group /Raw Group /Raw/Reads Group /Raw/Reads/Read_194 Group /Raw/Reads/Read_194/Signal Dataset {328148/Inf} /UniqueGlobalKey Group /UniqueGlobalKey/channel_id Group /UniqueGlobalKey/context_tags Group /UniqueGlobalKey/tracking_id Group
It seems that these fast5 files contain more contents compared to my own data and first test data. But I dont konwn whether the completeness of fast5 files is the causation. Because it is done successful, but there are still some same errors as before:
Nanopore sequencing data analysis is resourece-intensive and time consuming. Some potential strong recommendations are below: If your reference genome is large as human genome and your Nanopore data is huge, It would be faster to run this program parallelly to speed up. You might run different input folders of your fast5 files and give different output names (--FileID) or folders (--outFolder) A good way for this is to run different chromosome individually.
Current directory: ~/deepmod_test
outLevel: 2
wrkBase: ~/deepmod/fast5_file
FileID: test
outFolder: myoutput/
recursive: 1
files_per_thread: 1000
threads: 5
windowsize: 21
alignStr: minimap2
basecall_1d: Basecall_1D_000
basecall_2strand: BaseCalled_template
ConUnk: True
outputlayer:
Base: C
mod_cluster: 0
predDet: 1
Ref: ~/deepmod/reference/human_refernce_genome.fa
fnum: 7
hidden: 100
modfile: ~/DeepMod/train_mod/rnn_conmodC_P100wd21_f7ne1u0_4/mod_train_conmodC_P100wd21_f3ne1u0
region: [[None, None, None]]
Total files=2772 Error!!! No Fastq data in ~/fast5_file/MinION2_20161027_FNFAB42476_MN20093_sequencing_run_Chip102_Genomic_R9_4_450bps_40738_ch178_read503_strand.fast5 Error!!! No Fastq data in ~/fast5_file/MinION2_20161020_FNFAB42473_MN20093_sequencing_run_Chip101_Genomic_R9_4_450bps_tune_74642_ch375_read753_strand1.fast5 Error!!! No events data in ~/fast5_file/PLSP61583_20161021_FNFAB42561_MN17048_sequencing_run_94_II_Hum_2_24_tune_75076_ch124_read559_strand.fast5 ... ...
Besides the same errors, there are some other messages like these:
Cur Prediction consuming time 1102 for 0 2 Cur Prediction consuming time 2031 for 0 0 Cur Prediction consuming time 2140 for 0 1 Error information for different fast5 files: No events data 8 No Fastq data 21 Not in alignment sam 685 Per-read Prediction consuming time 2149 Find: myoutput//test 25 rnn.pred.ind ['myoutput//test/rnn.pred.ind.chr9', 'myoutput//test/rnn.pred.ind.chr14', 'myoutput//test/rnn.pred.ind.chr17', 'myoutput//test/rnn.pred.ind.chr20', 'myoutput//test/rnn.pred.ind.chr3', 'myoutput//test/rnn.pred.ind.chr2', 'myoutput//test/rnn.pred.ind.chr15', 'myoutput//test/rnn.pred.ind.chr10', 'myoutput//test/rnn.pred.ind.chr7', 'myoutput//test/rnn.pred.ind.chr5', 'myoutput//test/rnn.pred.ind.chrY', 'myoutput//test/rnn.pred.ind.chr16', 'myoutput//test/rnn.pred.ind.chr6', 'myoutput//test/rnn.pred.ind.chr13', 'myoutput//test/rnn.pred.ind.chr22', 'myoutput//test/rnn.pred.ind.chr8', 'myoutput//test/rnn.pred.ind.chrX', 'myoutput//test/rnn.pred.ind.chr19', 'myoutput//test/rnn.pred.ind.chr21', 'myoutput//test/rnn.pred.ind.chr1', 'myoutput//test/rnn.pred.ind.chr18', 'myoutput//test/rnn.pred.ind.chr11', 'myoutput//test/rnn.pred.ind.chr4', 'myoutput//test/rnn.pred.ind.chrM', 'myoutput//test/rnn.pred.ind.chr12'] ====sum done! To save Save myoutput/test/mod_pos.chr9-.C.bed ====sum done! To save Save myoutput/test/mod_pos.chr2-.C.bed ====sum done! To save Save myoutput/test/mod_pos.chr10-.C.bed ====sum done! To save Save myoutput/test/mod_pos.chr5-.C.bed ====sum done! To save Save myoutput/test/mod_pos.chr6-.C.bed ====sum done! To save Save myoutput/test/mod_pos.chrX+.C.bed ====sum done! To save Save myoutput/test/mod_pos.chrX-.C.bed ====sum done! To save Save myoutput/test/mod_pos.chr19-.C.bed ====sum done! To save Save myoutput/test/mod_pos.chr1-.C.bed ... ...
I can just provide these information. I hope it is useful.
The error is because fast5 is not basecalled with fq and event info. Rebasecall with albacore can solve the error.
The error is because fast5 is not basecalled with fq and event info. Rebasecall with albacore can solve the error.
If so, why same errors were reported for a few fast5 files in Na12878 Nanopore sequencing data used in Example 3: Detect 5mC on Na12878. This is the content of one error fast5 file : h5ls -r /GS01/project/pengms_group/pengms20t1/dir.xumm/sv_test/fast5_file/MinION2_20161027_FNFAB42476_MN20093_sequencing_run_Chip102_Genomic_R9_4_450bps_40738_ch178_read503_strand.fast5
/ Group /Analyses Group /Analyses/Basecall_1D_000 Group /Analyses/Basecall_1D_000/Summary Group /Analyses/Basecall_1D_000/Summary/basecall_1d_template Group /Analyses/Basecall_1D_001 Group /Analyses/Basecall_1D_001/BaseCalled_template Group /Analyses/Basecall_1D_001/BaseCalled_template/Events Dataset {32861} /Analyses/Basecall_1D_001/BaseCalled_template/Fastq Dataset {SCALAR} /Analyses/Basecall_1D_001/Configuration Group /Analyses/Basecall_1D_001/Configuration/aggregator Group /Analyses/Basecall_1D_001/Configuration/basecall_1d Group /Analyses/Basecall_1D_001/Configuration/calibration_strand Group /Analyses/Basecall_1D_001/Configuration/components Group /Analyses/Basecall_1D_001/Configuration/event_detection Group /Analyses/Basecall_1D_001/Configuration/general Group /Analyses/Basecall_1D_001/Configuration/genome_mapping Group /Analyses/Basecall_1D_001/Configuration/split_hairpin Group /Analyses/Basecall_1D_001/Log Dataset {SCALAR} /Analyses/Basecall_1D_001/Summary Group /Analyses/Basecall_1D_001/Summary/basecall_1d_template Group /Analyses/Calibration_Strand_000 Group /Analyses/Calibration_Strand_000/Configuration Group /Analyses/Calibration_Strand_000/Configuration/aggregator Group /Analyses/Calibration_Strand_000/Configuration/basecall_1d Group /Analyses/Calibration_Strand_000/Configuration/basecall_2d Group /Analyses/Calibration_Strand_000/Configuration/calibration_strand Group /Analyses/Calibration_Strand_000/Configuration/components Group /Analyses/Calibration_Strand_000/Configuration/general Group /Analyses/Calibration_Strand_000/Configuration/genome_mapping Group /Analyses/Calibration_Strand_000/Configuration/hairpin_align Group /Analyses/Calibration_Strand_000/Configuration/post_processing.3000Hz Group /Analyses/Calibration_Strand_000/Configuration/split_hairpin Group /Analyses/Calibration_Strand_000/Log Dataset {SCALAR} /Analyses/Calibration_Strand_000/Summary Group /Analyses/EventDetection_000 Group /Analyses/EventDetection_000/Configuration Group /Analyses/EventDetection_000/Configuration/aggregator Group /Analyses/EventDetection_000/Configuration/basecall_1d Group /Analyses/EventDetection_000/Configuration/calibration_strand Group /Analyses/EventDetection_000/Configuration/components Group /Analyses/EventDetection_000/Configuration/event_detection Group /Analyses/EventDetection_000/Configuration/general Group /Analyses/EventDetection_000/Configuration/split_hairpin Group /Analyses/EventDetection_000/Log Dataset {SCALAR} /Analyses/EventDetection_000/Reads Group /Analyses/EventDetection_000/Reads/Read_503 Group /Analyses/EventDetection_000/Reads/Read_503/Events Dataset {33632} /Analyses/EventDetection_000/Summary Group /Analyses/EventDetection_000/Summary/event_detection Group /Analyses/Segment_Linear_000 Group /Analyses/Segment_Linear_000/Configuration Group /Analyses/Segment_Linear_000/Configuration/aggregator Group /Analyses/Segment_Linear_000/Configuration/basecall_1d Group /Analyses/Segment_Linear_000/Configuration/calibration_strand Group /Analyses/Segment_Linear_000/Configuration/components Group /Analyses/Segment_Linear_000/Configuration/general Group /Analyses/Segment_Linear_000/Configuration/split_hairpin Group /Analyses/Segment_Linear_000/Log Dataset {SCALAR} /Analyses/Segment_Linear_000/Summary Group /Analyses/Segment_Linear_000/Summary/split_hairpin Group /Raw Group /Raw/Reads Group /Raw/Reads/Read_503 Group /Raw/Reads/Read_503/Signal Dataset {167297/Inf} /UniqueGlobalKey Group /UniqueGlobalKey/channel_id Group /UniqueGlobalKey/context_tags Group /UniqueGlobalKey/tracking_id Group
Hi @qiuyixmm, the errors in the first two datasets are due to that the fast5 files are not basecalled and thus no fastq info. The error in NA12878 dataset is that the default basecalle under Basecall_1D_000
is incorrect:, but the correct basecall is under Basecall_1D_001
-----one solution for this is to remove the basecall in those error fast5 files and re-basecalled; or to set --basecall_1d Basecall_1D_001
ONLY for those error fast5 files.
@liuqianhn hello, i downloaded test data from http://s3.climb.ac.uk/nanopolish_tutorial/methylation_example.tar.gz, a subset of the NA12878 WGS Consortium data used in the tutorial of nanopolish calling methylation. The command line below like this:
python DeepMod.py detect \ --wrkBase ~/deepmod_test/fast5_files \ --Ref ~/deepmod_test/reference/reference.fasta \ --FileID test \ --modfile ~/DeepMod/train_mod/rnn_conmodC_P100wd21_f7ne1u0_4/mod_train_conmodC_P100wd21_f3ne1u0 \ --threads 5 --outFolder myoutput/
note: directory ~/deepmod_test/fast5_files canotains signal-level FAST5 files unpacked from the downlaod data package.
the error information: Nanopore sequencing data analysis is resourece-intensive and time consuming. Some potential strong recommendations are below: If your reference genome is large as human genome and your Nanopore data is huge, It would be faster to run this program parallelly to speed up. You might run different input folders of your fast5 files and give different output names (--FileID) or folders (--outFolder) A good way for this is to run different chromosome individually.
Total files=19275 Error!!! No Fastq data in ~/deepmod_test/fast5_files/nanopore2_20161128_FNFAB49712_MN17633_sequencing_run_20161128_Human_Qiagen_1D_R9_4_64849_ch388_read4650_strand.fast5 ... ...
Beside this, i also used my own data to run DeepMod and same errors " No Fastq data in *.fast5" were produced.
Could you please help me with providing some solution ? Thanks !