WGLab / DeepMod

DeepMod: a deep-learning tool for genomic-scale, strand-sensitive and single-nucleotide based detection of DNA modifications
Other
97 stars 35 forks source link

Using deepmod on basecalled fast5 from latest guppy #42

Open hasindu2008 opened 3 years ago

hasindu2008 commented 3 years ago

In the usage page it is stated that FAST5 must be basecalled and events data must be available in them. However, it seems that the latest Guppy basecaller does not include any events data as Albacore used to do (see below). As mentioned in the readme, it is possible to convert multi-fast5 to single-fast5 using ont-fast5-api. However, I am not sure how Guppy can be asked to save events data in FAST5. Could you shed some light on this?

image

liuqianhn commented 3 years ago

@hasindu2008 You are right: the latest guppy uses move table rather than event table. Move table is supported by DeepMod now with --move True. Please note that we do not retrain new models or test old models(but we have been improving it). If you have any performance regarding this, please feel free to share it. Thanks.

hasindu2008 commented 3 years ago

Do you have an example guppy command for latest guppy 4 to ask it to generate this move table? I have some fast5 files generated from Guppy 4.0.3 live-base-calling which seem to have the FASTQ read inside but not any such move table. Is that supposed to be inside the Analyses/segmentation group? In this case, that group have a few attributes but no data tables.

liuqianhn commented 3 years ago

@hasindu2008 could you please what you can get from h5ls -r your-fast5 | head -n 50? In some cases, move/event tables are not available, and you need to re-basecalled with potential options so that move tables are generated in fast5 files, before using deepmod.

hasindu2008 commented 3 years ago
/                        Group
/read_0013515e-5b4e-4588-843e-b5af4a4b87da Group
/read_0013515e-5b4e-4588-843e-b5af4a4b87da/Analyses Group
/read_0013515e-5b4e-4588-843e-b5af4a4b87da/Analyses/Basecall_1D_000 Group
/read_0013515e-5b4e-4588-843e-b5af4a4b87da/Analyses/Basecall_1D_000/BaseCalled_template Group
/read_0013515e-5b4e-4588-843e-b5af4a4b87da/Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR}
/read_0013515e-5b4e-4588-843e-b5af4a4b87da/Analyses/Basecall_1D_000/Summary Group
/read_0013515e-5b4e-4588-843e-b5af4a4b87da/Analyses/Basecall_1D_000/Summary/basecall_1d_template Group
/read_0013515e-5b4e-4588-843e-b5af4a4b87da/Analyses/Segmentation_000 Group
/read_0013515e-5b4e-4588-843e-b5af4a4b87da/Analyses/Segmentation_000/Summary Group
/read_0013515e-5b4e-4588-843e-b5af4a4b87da/Analyses/Segmentation_000/Summary/segmentation Group
/read_0013515e-5b4e-4588-843e-b5af4a4b87da/Raw Group
/read_0013515e-5b4e-4588-843e-b5af4a4b87da/Raw/Signal Dataset {8409/Inf}
/read_0013515e-5b4e-4588-843e-b5af4a4b87da/channel_id Group
/read_0013515e-5b4e-4588-843e-b5af4a4b87da/context_tags Group
/read_0013515e-5b4e-4588-843e-b5af4a4b87da/tracking_id Group
/read_002f7800-db08-4ff5-b2b5-c78d9e72ac3a Group
/read_002f7800-db08-4ff5-b2b5-c78d9e72ac3a/Analyses Group
/read_002f7800-db08-4ff5-b2b5-c78d9e72ac3a/Analyses/Basecall_1D_000 Group
/read_002f7800-db08-4ff5-b2b5-c78d9e72ac3a/Analyses/Basecall_1D_000/BaseCalled_template Group
/read_002f7800-db08-4ff5-b2b5-c78d9e72ac3a/Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR}
/read_002f7800-db08-4ff5-b2b5-c78d9e72ac3a/Analyses/Basecall_1D_000/Summary Group
/read_002f7800-db08-4ff5-b2b5-c78d9e72ac3a/Analyses/Basecall_1D_000/Summary/basecall_1d_template Group
/read_002f7800-db08-4ff5-b2b5-c78d9e72ac3a/Analyses/Segmentation_000 Group
/read_002f7800-db08-4ff5-b2b5-c78d9e72ac3a/Analyses/Segmentation_000/Summary Group
/read_002f7800-db08-4ff5-b2b5-c78d9e72ac3a/Analyses/Segmentation_000/Summary/segmentation Group
/read_002f7800-db08-4ff5-b2b5-c78d9e72ac3a/Raw Group
/read_002f7800-db08-4ff5-b2b5-c78d9e72ac3a/Raw/Signal Dataset {24867/Inf}
/read_002f7800-db08-4ff5-b2b5-c78d9e72ac3a/channel_id Group
/read_002f7800-db08-4ff5-b2b5-c78d9e72ac3a/context_tags Group, same as /read_0013515e-5b4e-4588-843e-b5af4a4b87da/context_tags
/read_002f7800-db08-4ff5-b2b5-c78d9e72ac3a/tracking_id Group, same as /read_0013515e-5b4e-4588-843e-b5af4a4b87da/tracking_id
/read_00457254-a6e4-429e-b8f3-3dc6337b1554 Group
/read_00457254-a6e4-429e-b8f3-3dc6337b1554/Analyses Group
/read_00457254-a6e4-429e-b8f3-3dc6337b1554/Analyses/Basecall_1D_000 Group
/read_00457254-a6e4-429e-b8f3-3dc6337b1554/Analyses/Basecall_1D_000/BaseCalled_template Group
/read_00457254-a6e4-429e-b8f3-3dc6337b1554/Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR}
/read_00457254-a6e4-429e-b8f3-3dc6337b1554/Analyses/Basecall_1D_000/Summary Group
/read_00457254-a6e4-429e-b8f3-3dc6337b1554/Analyses/Basecall_1D_000/Summary/basecall_1d_template Group
/read_00457254-a6e4-429e-b8f3-3dc6337b1554/Analyses/Segmentation_000 Group
/read_00457254-a6e4-429e-b8f3-3dc6337b1554/Analyses/Segmentation_000/Summary Group
/read_00457254-a6e4-429e-b8f3-3dc6337b1554/Analyses/Segmentation_000/Summary/segmentation Group
/read_00457254-a6e4-429e-b8f3-3dc6337b1554/Raw Group
/read_00457254-a6e4-429e-b8f3-3dc6337b1554/Raw/Signal Dataset {57910/Inf}
/read_00457254-a6e4-429e-b8f3-3dc6337b1554/channel_id Group
/read_00457254-a6e4-429e-b8f3-3dc6337b1554/context_tags Group, same as /read_0013515e-5b4e-4588-843e-b5af4a4b87da/context_tags
/read_00457254-a6e4-429e-b8f3-3dc6337b1554/tracking_id Group, same as /read_0013515e-5b4e-4588-843e-b5af4a4b87da/tracking_id
/read_0073ec46-24e3-40c7-980d-5b8c0c6059bd Group
/read_0073ec46-24e3-40c7-980d-5b8c0c6059bd/Analyses Group
/read_0073ec46-24e3-40c7-980d-5b8c0c6059bd/Analyses/Basecall_1D_000 Group
/read_0073ec46-24e3-40c7-980d-5b8c0c6059bd/Analyses/Basecall_1D_000/BaseCalled_template Group

Seems like the move table is not there in this file? Do you know which options I should pass to modern Guppy?

liuqianhn commented 3 years ago

@hasindu2008 yes, move/event table is not in the file, and you need re-basecall it. You might find the help documents for your basecaller using guppy_basecaller --help or from nanopore community.

123chenshixin commented 3 years ago

@hasindu2008 I tried to re-basecall my own single-read fast5 files without any move and event data throught guppy_basecaller. And I ultimately successful create move data.My command is as follows and I hope it can be some help for you. guppy_basecaller -i /home/cxs3_z4/ds/single_fast5_J1-019A -r -s ./J1-019A --config dna_r9.4.1_450bps_hac_prom.cfg --fast5_out

The guppy_basecaller version: guppy_basecaller --version

: Guppy Basecalling Software, (C) Oxford Nanopore Technologies, Limited. Version 3.1.5+781ed57

It creates "workspace" directory in my working path.And one of the output single-read fast5 file is as follows. h5ls -r fff36937-dff8-4f8b-a343-d1b680e7f99c.fast5

/ Group /Analyses Group /Analyses/Basecall_1D_000 Group /Analyses/Basecall_1D_000/BaseCalled_template Group /Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR} /Analyses/Basecall_1D_000/Summary Group /Analyses/Basecall_1D_000/Summary/basecall_1d_template Group /Analyses/Basecall_1D_001 Group /Analyses/Basecall_1D_001/BaseCalled_template Group /Analyses/Basecall_1D_001/BaseCalled_template/Fastq Dataset {SCALAR} /Analyses/Basecall_1D_001/BaseCalled_template/Move Dataset {91653} /Analyses/Basecall_1D_001/BaseCalled_template/Trace Dataset {91653, 8} /Analyses/Basecall_1D_001/Summary Group /Analyses/Basecall_1D_001/Summary/basecall_1d_template Group /Analyses/RawGenomeCorrected_001 Group /Analyses/Segmentation_000 Group /Analyses/Segmentation_000/Summary Group /Analyses/Segmentation_000/Summary/segmentation Group /Analyses/Segmentation_001 Group /Analyses/Segmentation_001/Summary Group /Analyses/Segmentation_001/Summary/segmentation Group /Raw Group /Raw/Reads Group /Raw/Reads/Read_27254 Group /Raw/Reads/Read_27254/Signal Dataset {184511/Inf} /UniqueGlobalKey Group /UniqueGlobalKey/channel_id Group /UniqueGlobalKey/context_tags Group /UniqueGlobalKey/tracking_id Group

The move data is in the Basecall_1D_001/BaseCalled_template path.

shaodongyan commented 2 years ago

Basecall_1D_001/BaseCalled_template hello,can i know you deepmod order?

liuqianhn commented 2 years ago

@shaodongyan I have no idea what "deepmod order" means. But based on my understanding, deepmod never test on guppy basecalling and have some issues on it. I would not recommend using it on guppy basecalled until we finish a test; otherwise, the results are not correct.