epi2me-labs / wf-human-variation

Other
86 stars 41 forks source link

Disable automatic Clair3 basecaller lookup #186

Open Brynjar-H opened 1 month ago

Brynjar-H commented 1 month ago

Ask away!

Hello, i have a merged bam file which i want to use, it was basecalled with dna_r10.4.1_e8.2_5khz_400bps_sup@v4.2.0. However it seems this 5khz prefix is no longer used and clair3 only recognises it as dna_r10.4.1_e8.2_400bps_sup@v4.2.0. when i try to run

nextflow run epi2me-labs/wf-human-variation \ --bam '/proj/hpcdata/Mimir/shared/brynjar/nanopore/dna_met/10_d492M2/merged.bam' \ --basecaller_cfg 'dna_r10.4.1_e8.2_400bps_sup@v4.2.0' \ --mod True \ --ref '/hpcdata/Mimir/snaevar/Homo_sapiens.GRCh38.dna.primary_assembly.fa' \ --sample_name '10_d492M-2' \ --snp True \ --sv True \ --phased True \ --sex XY \ -profile singularity \ --outdir '/proj/hpcdata/Mimir/shared/brynjar/nanopore/human_var/gogn/10_d492M-2'

Clair3 automaticly looks for the model associated with the file which has the 5khz prefix resulting in an error since Clair3 does not recognise it. Does anyone know how i could modify the script to use --basecaller_cfg 'dna_r10.4.1_e8.2_400bps_sup@v4.2.0' \ instead of Clair3 looking it up?

SamStudio8 commented 1 month ago

Thanks for your report @Brynjar-H. As you have discovered, the workflow will override any user provided basecaller_cfg with the information found in the headers to prevent users from mistakenly running the workflow with the wrong basecaller model (or to prevent users overriding the workflow to run on unsupported models!).

Unfortunately this is problematic in the case where you really know what the model is better than the workflow! We'll update the workflow to add the "dna_r10.4.1_e8.2_5khz_400bps_sup@v4.2.0" basecaller configuration to ensure this data is supported. This update should be released as part of our update next week.

Brynjar-H commented 1 month ago

Thanks a bunch <3