WGLab / DeepMod

DeepMod: a deep-learning tool for genomic-scale, strand-sensitive and single-nucleotide based detection of DNA modifications
Other
97 stars 35 forks source link

Cannot open fast5 or other errors #52

Open zmy24 opened 2 years ago

zmy24 commented 2 years ago

I'm trying to call DNA modifications by using Deepmod. First, I used Guppy for basecalling. And then, run Deepmod to call DNA modifications. But it has the error 'Cannot open fast5 or other errors: '. This is the output of h5ls -r of my fast5 file: / Group /Analyses Group /Analyses/Barcoding_000 Group /Analyses/Barcoding_000/Barcoding Group /Analyses/Barcoding_000/Barcoding/Aligns Dataset {756} /Analyses/Barcoding_000/Barcoding/Fastq Dataset {SCALAR} /Analyses/Barcoding_000/Configuration Group /Analyses/Barcoding_000/Configuration/aggregator Group /Analyses/Barcoding_000/Configuration/barcoding Group /Analyses/Barcoding_000/Configuration/basecall_1d Group /Analyses/Barcoding_000/Configuration/basecall_2d Group /Analyses/Barcoding_000/Configuration/calibration_strand Group /Analyses/Barcoding_000/Configuration/components Group /Analyses/Barcoding_000/Configuration/general Group /Analyses/Barcoding_000/Configuration/hairpin_align Group /Analyses/Barcoding_000/Configuration/post_processing.3000Hz Group /Analyses/Barcoding_000/Configuration/split_hairpin Group /Analyses/Barcoding_000/Log Dataset {SCALAR} /Analyses/Barcoding_000/Summary Group /Analyses/Barcoding_000/Summary/barcoding Group /Analyses/Basecall_1D_000 Group /Analyses/Basecall_1D_000/BaseCalled_template Group /Analyses/Basecall_1D_001 Group /Analyses/Basecall_1D_001/BaseCalled_template Group /Analyses/Basecall_1D_001/BaseCalled_template/Fastq Dataset {SCALAR} /Analyses/Basecall_1D_001/BaseCalled_template/Move Dataset {75012} /Analyses/Basecall_1D_001/BaseCalled_template/StateData Dataset {75012, 40} /Analyses/Basecall_1D_001/BaseCalled_template/Trace Dataset {75012, 8} /Analyses/Basecall_1D_001/Summary Group /Analyses/Basecall_1D_001/Summary/basecall_1d_template Group /Analyses/Basecall_2D_000 Group /Analyses/Basecall_2D_000/BaseCalled_2D Group /Analyses/Basecall_2D_000/BaseCalled_2D/Alignment Dataset {5654} /Analyses/Basecall_2D_000/BaseCalled_2D/Fastq Dataset {SCALAR} /Analyses/Basecall_2D_000/Configuration Group /Analyses/Basecall_2D_000/Configuration/aggregator Group /Analyses/Basecall_2D_000/Configuration/basecall_1d Group /Analyses/Basecall_2D_000/Configuration/basecall_2d Group /Analyses/Basecall_2D_000/Configuration/calibration_strand Group /Analyses/Basecall_2D_000/Configuration/components Group /Analyses/Basecall_2D_000/Configuration/event_detection Group /Analyses/Basecall_2D_000/Configuration/general Group /Analyses/Basecall_2D_000/Configuration/hairpin_align Group /Analyses/Basecall_2D_000/Configuration/post_processing Group /Analyses/Basecall_2D_000/Configuration/post_processing.4000Hz Group /Analyses/Basecall_2D_000/Configuration/split_hairpin Group /Analyses/Basecall_2D_000/HairpinAlign Group /Analyses/Basecall_2D_000/HairpinAlign/Alignment Dataset {4053} /Analyses/Basecall_2D_000/Log Dataset {SCALAR} /Analyses/Basecall_2D_000/Summary Group /Analyses/Basecall_2D_000/Summary/basecall_2d Group /Analyses/Basecall_2D_000/Summary/hairpin_align Group /Analyses/Basecall_2D_000/Summary/post_process_complement Group /Analyses/Basecall_2D_000/Summary/post_process_template Group /Analyses/Calibration_Strand_000 Group /Analyses/Calibration_Strand_000/Configuration Group /Analyses/Calibration_Strand_000/Configuration/aggregator Group /Analyses/Calibration_Strand_000/Configuration/basecall_1d Group /Analyses/Calibration_Strand_000/Configuration/basecall_2d Group /Analyses/Calibration_Strand_000/Configuration/calibration_strand Group /Analyses/Calibration_Strand_000/Configuration/components Group /Analyses/Calibration_Strand_000/Configuration/general Group /Analyses/Calibration_Strand_000/Configuration/genome_mapping Group /Analyses/Calibration_Strand_000/Configuration/hairpin_align Group /Analyses/Calibration_Strand_000/Configuration/post_processing.3000Hz Group /Analyses/Calibration_Strand_000/Configuration/split_hairpin Group /Analyses/Calibration_Strand_000/Log Dataset {SCALAR} /Analyses/Calibration_Strand_000/Summary Group /Analyses/EventDetection_000 Group /Analyses/EventDetection_000/Configuration Group /Analyses/EventDetection_000/Configuration/aggregator Group /Analyses/EventDetection_000/Configuration/basecall_1d Group /Analyses/EventDetection_000/Configuration/basecall_2d Group /Analyses/EventDetection_000/Configuration/calibration_strand Group /Analyses/EventDetection_000/Configuration/components Group /Analyses/EventDetection_000/Configuration/event_detection Group /Analyses/EventDetection_000/Configuration/general Group /Analyses/EventDetection_000/Configuration/hairpin_align Group /Analyses/EventDetection_000/Configuration/post_processing Group /Analyses/EventDetection_000/Configuration/post_processing.4000Hz Group /Analyses/EventDetection_000/Configuration/split_hairpin Group /Analyses/EventDetection_000/Log Dataset {SCALAR} /Analyses/EventDetection_000/Reads Group /Analyses/EventDetection_000/Reads/Read_10536 Group /Analyses/EventDetection_000/Reads/Read_10536/Events Dataset {9334} /Analyses/EventDetection_000/Summary Group /Analyses/EventDetection_000/Summary/event_detection Group /Analyses/Hairpin_Split_000 Group /Analyses/Hairpin_Split_000/Configuration Group /Analyses/Hairpin_Split_000/Configuration/aggregator Group /Analyses/Hairpin_Split_000/Configuration/basecall_1d Group /Analyses/Hairpin_Split_000/Configuration/basecall_2d Group /Analyses/Hairpin_Split_000/Configuration/calibration_strand Group /Analyses/Hairpin_Split_000/Configuration/components Group /Analyses/Hairpin_Split_000/Configuration/event_detection Group /Analyses/Hairpin_Split_000/Configuration/general Group /Analyses/Hairpin_Split_000/Configuration/hairpin_align Group /Analyses/Hairpin_Split_000/Configuration/post_processing Group /Analyses/Hairpin_Split_000/Configuration/post_processing.4000Hz Group /Analyses/Hairpin_Split_000/Configuration/split_hairpin Group /Analyses/Hairpin_Split_000/Log Dataset {SCALAR} /Analyses/Hairpin_Split_000/Summary Group /Analyses/Hairpin_Split_000/Summary/split_hairpin Group /Analyses/RawGenomeCorrected_000 Group /Analyses/RawGenomeCorrected_000/BaseCalled_template Group /Analyses/Segmentation_000 Group /Analyses/Segmentation_000/Summary Group /Analyses/Segmentation_000/Summary/segmentation Group /Raw Group /Raw/Reads Group /Raw/Reads/Read_10536 Group /Raw/Reads/Read_10536/Signal Dataset {390818/Inf} /UniqueGlobalKey Group /UniqueGlobalKey/channel_id Group /UniqueGlobalKey/context_tags Group /UniqueGlobalKey/tracking_id Group

And this is my code for Deepmod: python ./bin/DeepMod.py detect --wrkBase path_for_fast5 --Ref unidentified_spots_contigs.fasta --outFolder NB08 --Base C --modfile ./DeepMod/train_deepmod/rnn_conmodC_P100wd21_f7ne1u0_4/mod_train_conmodC_P100wd21_f3ne1u0 --FileID NB08 --basecall_1d Basecall_1D_001 --threads 1 --move

Can you help me? Thanks a lot!

liuqianhn commented 2 years ago

@zmy24 Thank you for using DeepMod.

  1. 'Cannot open fast5 or other errors: ' not sure why unless you can share the log file
  2. DeepMod's trained models are based on the datasets with very old basecallers. They cannot work with Guppy and Albacore v2. The other team members are now working on re-training models for datasets basecalled with guppy and Albacore v2.
Bin-Ma commented 2 years ago

Hi @liuqianhn I got a similar error in DeepMod when i have re-basecalled the single fast5 file. / Group /Analyses Group /Analyses/Basecall_1D_000 Group /Analyses/Basecall_1D_000/BaseCalled_template Group /Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR} /Analyses/Basecall_1D_001 Group /Analyses/Basecall_1D_001/BaseCalled_template Group /Analyses/Basecall_1D_001/BaseCalled_template/Events Dataset {1754} /Analyses/Basecall_1D_001/BaseCalled_template/Fastq Dataset {SCALAR} /Analyses/Basecall_1D_001/Configuration Group /Analyses/Basecall_1D_001/Configuration/basecall_1d Group /Analyses/Basecall_1D_001/Summary Group /Analyses/Basecall_1D_001/Summary/basecall_1d_template Group /Analyses/Calibration_Strand_Detection_000 Group /Analyses/Calibration_Strand_Detection_000/Configuration Group /Analyses/Calibration_Strand_Detection_000/Configuration/calib_detector Group /Analyses/Calibration_Strand_Detection_000/Summary Group /Analyses/Calibration_Strand_Detection_000/Summary/calibration_strand_template Group /Analyses/RawGenomeCorrected_000 Group /Analyses/RawGenomeCorrected_000/BaseCalled_template Group /Analyses/RawGenomeCorrected_000/BaseCalled_template/Alignment Group /Analyses/RawGenomeCorrected_000/BaseCalled_template/Events Dataset {1200} /Analyses/Segmentation_000 Group /Analyses/Segmentation_000/Configuration Group /Analyses/Segmentation_000/Configuration/stall_removal Group /Analyses/Segmentation_000/Summary Group /Analyses/Segmentation_000/Summary/segmentation Group /Raw Group /Raw/Reads Group /Raw/Reads/Read_8036 Group /Raw/Reads/Read_8036/Signal Dataset {8911/Inf} /UniqueGlobalKey Group /UniqueGlobalKey/channel_id Group /UniqueGlobalKey/context_tags Group /UniqueGlobalKey/tracking_id Group

Meanwhile, I set the --basecall_1d as Basecall_1D_001. But it still showed as "No events data 3596". Can you help me? Thanks a lot!

liuqianhn commented 2 years ago

@Bin-Ma Could you please share this fast5 file and the command you use? Please also kindly share the log output if you have. Thanks.

Bin-Ma commented 2 years ago

@Bin-Ma Could you please share this fast5 file and the command you use? Please also kindly share the log output if you have. Thanks.

@liuqianhn Thanks for your reply! I have uploaded the test data and log file. By the way, The command was generated as follow.

python ~/software/DeepMod/bin/DeepMod.py detect --wrkBase ../barcode_fast5/single/1/output/workspace/pass/test/ --Ref ../fna/barcode01.fna --outFolder ./barcode_test --Base A --modfile ~/software/DeepMod/train_deepmod/rnn_sinmodC_P100wd21_f7ne1u0_4/mod_train_sinmodC_P100wd21_f3ne1u0 --FileID test1 --threads 80 --basecall_1d Basecall_1D_001 > deepmod_test.log 2>&1

deepmod_test.zip .

liuqianhn commented 2 years ago

@Bin-Ma I checked the issue and your command.

  1. The base of interest is A specify by --Base, while the trained model is for C.
  2. DeepMod here is not trained with latest Albacore. Please check with @umahsn for DeepMod2 for the updated version
  3. The issue comes from https://github.com/WGLab/DeepMod/blob/98452950c5baef7cb572360067ab2a7b8bc68b37/bin/DeepMod_scripts/myDetect.py#L157. This is due to the version of h5py. Please also check with @umahsn for working with new version of h5py.