WGLab / DeepMod

DeepMod: a deep-learning tool for genomic-scale, strand-sensitive and single-nucleotide based detection of DNA modifications
Other
97 stars 35 forks source link

basecaller versions #31

Closed skerker closed 3 years ago

skerker commented 4 years ago

If I understand correctly, Albacore basecaller v2.3.1 was used for the model building in your manuscript. If I run DeepMod using Nanopore data that was basecalled with latest version of Guppy, are the models even valid?

Would you suggest re-basecalling my data using the exact same settings as in your paper?

My dataset is FLO-MIN106, LSK-109 kit, Guppy v3.4.5, dna_r9.4.1_450bps_hac In your paper: FLO-MIN106, LSK-108, Albacore v2.3.1, what basecalling model?

Thanks, Jeff

liuqianhn commented 4 years ago

Hi @skerker , the available DeepMod model on human genome is basecalled with r94_450bps_linear.cfg (which is determined by FLO-MIN version and LSK version) in Albacore v2.3.1.

If different basecalling is used (one for training, and the other for testing), the performance would still be valid but might be affected (But I do not have data to conclude the effect here now). Since the version of FLO-MIN and LSK in your data are different, I would suggest re-basecalling (if the Albacore has an available setting for your data) with the same basecaller rather than with the same basecalling model: the basecalling model is determined according to the version of FLO-MIN and LSK and should not be randomly selected.

skerker commented 4 years ago

Thanks for the information. I'll try it both ways - using my existing data and then I should use Albacore v2.3.1, and then set flow cell and kit to be: FLO-MIN106, LSK-109. Assuming Albacore has a basecalling model for that combination? Is that what you mean?

liuqianhn commented 4 years ago

@skerker , you might need to run Albacore help which will tell you how to use the combination of Flow cell and kit to find the correct basecalling model file (dna_r9.4.1_450bps_hac for example.). In my machine, I run read_fast5_basecaller.py -l to show them. If you cannot find correct version of flow cell or of kit, you might not be able to use albacore for re-basecalling.

skerker commented 4 years ago

Great - and thank you for the help!

skerker commented 4 years ago

I see this option in albacore v2.3.4 (read_fast5_basecaller.py -l) FLO-MIN106 SQK-LSK109 r94_450bps_linear.cfg

Is it worth re-basecalling my Nanopore data using this model, for most compability with your models? My flow cell is 9.4.1, not 9.4, so perhaps I can't use Albacore at all, and need to stay with Guppy.

liuqianhn commented 4 years ago

@skerker , I am afraid that I have no clear answer for this difference now. Please check Nanopore community to see the compatibility of R9.4 and R9.4.1.

liuqianhn commented 3 years ago

Closed due to no recent response. Feel free to reopen it if you need more help.