epi2me-labs / wf-bacterial-genomes

Small variant calling for haploid samples
https://labs.epi2me.io/
Other
26 stars 8 forks source link

r1041_e82_400bps_sup_v4.2.0 missing from basecaller options #15

Closed WolfgangSchmied closed 1 year ago

WolfgangSchmied commented 1 year ago

Is your feature related to a problem?

I use the bacterial genome workflow with data from Plasmidsaurus, a commercial plasmid/genome sequencing service based on Nanopore sequencing tech. They use the r1041_e82_400bps_sup_v4.2.0 basecalling option, which seems to be absent from the current Medeka models.

Describe the solution you'd like

Train the Medeka model on r1041_e82_400bps_sup_v4.2.0 and include it as an option.

Describe alternatives you've considered

What would be the closest, currently available basecalling model?

Additional context

No response

cjw85 commented 1 year ago

Hi @WolfgangSchmied,

The workflow attempts to translate basecaller model names to medaka model names. We will update the list of basecaller models that the workflow allows to be selected.

It is actually possible to directly specify the medaka model to use but that requires knowing the name from the medaka package. In your case that's "r1041_e82_400bps_sup_v4.2.0" as you have already stated.

WolfgangSchmied commented 1 year ago

How / where would I directly specify the Medeka model? Otherwise, thanks for the quick reply & looking forward to the update!

annabel-NZ commented 1 year ago

@cjw85 I have hit the same issue. I tried to supply the correct model as --medaka_consensus_model r1041_e82_400bps_sup_v4.2.0 and this works in my local medaka install but nexflow pulls a container (Singularity) with medaka 1.7.2 and this model doesn't exist in that version.

Update: fixed locally by editing the nextflow.config file to switch medaka version container_sha_medaka = "shaa3a062a2ddd830a0def7d65bc7382d28563b7e3f"

mattdmem commented 1 year ago

This model has been added. Closing issue.