HKU-BAL / Clair3

Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
247 stars 27 forks source link

model for r10.4.1 cells called with Guppy and not Dorado #248

Closed ghost closed 11 months ago

ghost commented 1 year ago

Hello,

I have received ONT sequence data,

I know that the base calling was done with Guppy, but there is no model I can find for cells R10.4.1, chemistry v14

The base calling command was

guppy_basecaller -i ./fast5 -s ./fastq_guppy6.3.8_8.2 -c dna_r10.4.1_e8.2_400bps_sup.cfg -x cuda:1

We used Guppy because we were not happy with the base quality given by Dorado, but Oxford nanopore seems to provide models only for Dorado with this chemistry. What would you advise me to do?

Thanks a lot

EDIT: to give you an idea I will share the quality profile of Guppy, then of Dorado

ghost commented 1 year ago

Guppy

Here is Guppy

ghost commented 1 year ago

Dorado

Here is Dorado, so we went with Guppy ...

ghost commented 12 months ago

If I am right this model should work?

https://github.com/nanoporetech/rerio/blob/master/clair3_models/r1041_e82_400bps_sup_v420_model

Thanks

aquaskyline commented 12 months ago

If I am right this model should work?

https://github.com/nanoporetech/rerio/blob/master/clair3_models/r1041_e82_400bps_sup_v420_model

Thanks

This is the model trained on R10.4.1 E8.2 (5kHz) data basecalled using Dorado v4.2.0 SUP

aquaskyline commented 12 months ago

Although Dorado model works for guppy data, I think it is best to use a model that is trained from the data called using the same basecaller and mode.

ghost commented 12 months ago

Yes, it makes sense, but the thing is that I can't find a model for 10.4.1 cell with guppy basecalling. I have the feeling they dropped the support for Guppy with 10.4 cells, so I don't know what to do.

ghost commented 12 months ago

I am sorry for writing again, I am aware your time is precious, but is this the model that corresponds to guppy AND the 10.4.1 cells?

https://github.com/nanoporetech/rerio/blob/master/clair3_models/r1041_e82_400bps_sup_g615_model

The g would stand for "guppy"? Once again, I am highly embarrassed to ask so many questions, but as you mentioned, it's critical.

Your guidance has been very important, especially considering Clair3's unique accuracy with our organism of interest, as shown in our PacBio data. We're eager to extend its application to our ONT data. Thank you once again for your valuable insights and assistance.

aquaskyline commented 12 months ago

Are you using 4khz or 5khz data?

ghost commented 12 months ago

Are you using 4khz or 5khz data?

It is is 5Khz

Thanks so much for your time. So do you think this is the "best" model?

aquaskyline commented 12 months ago

The differences between 4khz and 5khz data are more significant than between Dorado and Guppy. So I suggest you give Dorado 5khz model a try.

ghost commented 12 months ago

The differences between 4khz and 5khz data are more significant than between Dorado and Guppy. So I suggest you give Dorado 5khz model a try.

We will try the last guppy model and the Dorado one and check which one is the "noisest" in the pileup file. Does it sound like a good idea to you? Again, I am highly grateful for all your support. It means a lot for a PhD student.

aquaskyline commented 12 months ago

it makes sense.