epi2me-labs / wf-bacterial-genomes

Small variant calling for haploid samples
https://labs.epi2me.io/
Other
22 stars 8 forks source link

Medaka model for dna_r10.4.1_e8.2_260bps_sup ?? #11

Closed lagphase closed 1 year ago

lagphase commented 1 year ago

What happened?

Hello,

There is no medaka model for the R10.4.1, 260bps, super accurate basecalling. What should I do?

Operating System

Windows 10

Workflow Execution

Command line

Workflow Execution - EPI2ME Labs Versions

No response

Workflow Execution - CLI Execution Profile

None

Workflow Version

0.2.12

Relevant log output

ERROR: Validation pipeline of parameters failed!

*--basecaller_cfg: dna_r10.4.1_e8.2_260bps_sup is not a valid enum value (dna_r10.4.1_e8.2_260bps_sup)
cjw85 commented 1 year ago

I don't believe that medaka models were ever created for the 260bps sequencing chemistries. You could try using the equivalent 400bps medaka model, but this is untested.

yygitont commented 1 year ago

In this table, https://github.com/epi2me-labs/wf-bacterial-genomes/blob/master/data/medaka_models.tsv, I saw 'dna_r10.4.1_e8.2_260bps_hac' and the only 400bps_sup model currently available is 'dna_r10.4.1_e8.2_400bps_sup@v3.5.2'. For the 260bps_sup data, if not using 'dna_r10.4.1_e8.2_400bps_sup@v3.5.2' but rebasecalling with hac and then using 'dna_r10.4.1_e8.2_260bps_hac', do you think the latter would give better results?

image

lagphase commented 1 year ago

Better result means better accuracy? I'm de novo assembling and if I don't use the appropriate medaka model, my concern is how much it influences the accuracy...

I'll sequence with 400bps from now on but the past sequencing data... I guess I'll rebasecall with hac ..

mattdmem commented 1 year ago

Closing this issue for now. @lagphase please let us know if you have any further issues with the model selection in this workflow. We will always add all the models we have access to and as @cjw85 mentioned there are no plans at the moment to add them for 260bps chemistries.