fmfi-compbio / deepnano-blitz

Very fast ONT basecaller
MIT License
52 stars 12 forks source link

Support for R9.4 or 10.3 #9

Open JensUweUlrich opened 3 years ago

JensUweUlrich commented 3 years ago

Hi,

first of all I want to say thanks for developing that super fast basecaller. It is even fast enough to use it for live basecalling. And my intention is to use it in a ReadUntil context. My question is, if there would be a possibility to provide support for R9.4 or R10.3 pores? As far as I understood your software, it would just need other weights files trained on data from experiments using these pores.

Looking forward for your answer!

Best regards Jens

usamec commented 3 years ago

Hi Jens, we currently only support R9.4.1 DNA basecalling (comparable to using Guppy config file dna_r9.4.1_450bps_fast.cfg). So what exactly do you mean by R9.4 (is it some older slower version of the pore)?

Your understanding is correct, however we do not have any data from R10.3 right now, so we are unable to train models for R10.3. If you have any pointers to publicly available R10.3 data, we could train some models for it.

Best, Vlado

JensUweUlrich commented 3 years ago

Hi Vlado,

thanks for your answer. Sorry, I was a bit confused with the pore versions of 9.4. I thought that ONT also hat 9.4.3 pores. But I probably just mixed things up here. So if I would have some data from sequencing with R10.3 pores, how much would be needed to train the model and would we need data from different sequenced organisms? And is it possible to do that on our own using DeepNano?

Best regards Jens

usamec commented 3 years ago

For 9.4 training we used around 130 Mbps from three different organisms (human, ecoli, saccharomyces). But I would say that at least two different organisms, where one of them has reasonably long genome (at least 100Mbps) should be enough. Also it would be great if at least some of the data are from native sequencing (not PCR) so we also have methylated bases in training set.

As for doing the the actual training. We have not published training scripts, they are little bit messy, but we can do that training for you (it takes couple of days, but we learn a lot during past couple of months, so we can maybe make it faster).

JensUweUlrich commented 3 years ago

Hi Vlado,

ONT released some R10.3 datasets last September https://nanoporetech.github.io/ont-open-datasets/gm24385_2020.09/

Would that be sufficient? I will also ask in our lab if they already sequenced some samples with R10.3 pores.

Thanks Jens