epi2me-labs / wf-amplicon

Other
26 stars 6 forks source link

Failed to interpret 'dna_r9.4.1_450bps_hac:variant' as a basecaller model #21

Closed warthmann closed 4 months ago

warthmann commented 4 months ago

Operating System

Other Linux (please specify below)

Other Linux

Ubuntu 20.04

Workflow Version

wf-amplicon v1.1.1

Workflow Execution

EPI2ME Desktop (Local)

Other workflow execution

No response

EPI2ME Version

v5.1.14

CLI command run

No response

Workflow Execution - CLI Execution Profile

None

What happened?

I am sequencing barcoded amplicons on a flongle with 9.4 chemistry, I run minknow without basecalling and then basecall / demultiplex with:

guppy_basecaller \ -c dna_r9.4.1_450bps_hac.cfg \ .... guppy_barcoder \ --input_path .xxx --save_path ./yyy --barcode_kits SQK-RBK004

guppy_basecaller --help : Guppy Basecalling Software, (C) Oxford Nanopore Technologies plc. Version 6.5.7+ca6d6af, minimap2 version 2.24-r1122

guppy_barcoder --help guppy_barcoder, part of Guppy basecalling suite, (C) Oxford Nanopore Technologies plc. Version 6.5.7+ca6d6af


the EPI2ME analysis with wf-amplicon worked fine for me in the past (a few months ago) but does not work anymore:

first attempt:

Core Nextflow options runName : loving_mcclintock containerEngine: docker launchDir : /home/pbgl/epi2melabs/instances/wf-amplicon_01J3MJW5B7D2JQ5HR6YNG73V7T workDir : /home/pbgl/epi2melabs/instances/wf-amplicon_01J3MJW5B7D2JQ5HR6YNG73V7T/work projectDir : /home/pbgl/epi2melabs/workflows/epi2me-labs/wf-amplicon userName : pbgl profile : standard configFiles : /home/pbgl/epi2melabs/workflows/epi2me-labs/wf-amplicon/nextflow.config Input Options fastq : xxx/fastq_bascalled_25_07_2024_demultiplexed reference : xxx.fa Sample Options sample_sheet : /xxx/sample-sheet-July23-six-amplicons Pre-processing Options min_n_reads : 10 Variant Calling Options min_coverage : 10 Output Options out_dir : /home/pbgl/epi2melabs/instances/wf-amplicon_01J3MJW5B7D2JQ5HR6YNG73V7T/output combine_results: true !! Only displaying parameters that differ from the pipeline defaults !! is returned: [..................]

ERROR ~ Error executing process > 'pipeline:variantCallingPipeline:medakaConsensus (1)' Caused by: Found no basecall model information in the input data for sample 'xxx'. Please provide it with the --override_basecaller_cfg parameter. -- Check script '/home/pbgl/epi2melabs/workflows/epi2me-labs/wf-amplicon/./modules/local/./common.nf' at line: 104 Source block:


when run again with

Advanced Options override_basecaller_cfg: dna_r9.4.1_450bps_hac

the following error occurs

ERROR ~ Error executing process > 'pipeline:variantCallingPipeline:medakaConsensus (1)' Caused by: Process pipeline:variantCallingPipeline:medakaConsensus (1) terminated with an error exit status (1) Command executed: medaka consensus input.bam consensus_probs.hdf --threads 2 --regions 'ref' --model dna_r9.4.1_450bps_hac:variant Command exit status: 1 Command output: (empty) Command error: Cannot import pyabpoa, some features may not be available. Failed to interpret 'dna_r9.4.1_450bps_hac:variant' as a basecaller model. Traceback (most recent call last): File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/medaka.py", line 36, in call model_fp = medaka.models.resolve_model(val)


please advise. Which basecaller model should I use?

Relevant log output

N E X T F L O W  ~  version 23.04.2
Launching `/home/pbgl/epi2melabs/workflows/epi2me-labs/wf-amplicon/main.nf` [agitated_ptolemy] DSL2 - revision: 7b8dda5f6f
||||||||||   _____ ____ ___ ____  __  __ _____      _       _
||||||||||  | ____|  _ \_ _|___ \|  \/  | ____|    | | __ _| |__  ___
|||||       |  _| | |_) | |  __) | |\/| |  _| _____| |/ _` | '_ \/ __|
|||||       | |___|  __/| | / __/| |  | | |__|_____| | (_| | |_) \__ \
||||||||||  |_____|_|  |___|_____|_|  |_|_____|    |_|\__,_|_.__/|___/
||||||||||  wf-amplicon v1.1.1
--------------------------------------------------------------------------------
Core Nextflow options
  runName                : agitated_ptolemy
  containerEngine        : docker
  launchDir              : /home/pbgl/epi2melabs/instances/wf-amplicon_01J3MKHJ5SS8BK2J07J2KK8TBN
  workDir                : /home/pbgl/epi2melabs/instances/wf-amplicon_01J3MKHJ5SS8BK2J07J2KK8TBN/work
  projectDir             : /home/pbgl/epi2melabs/workflows/epi2me-labs/wf-amplicon
  userName               : pbgl
  profile                : standard
  configFiles            : /home/pbgl/epi2melabs/workflows/epi2me-labs/wf-amplicon/nextflow.config
Input Options
  fastq                  : /home/pbgl/MinION-basecalls/Barley-Amplicons/fastq_bascalled_25_07_2024_demultiplexed
  reference              : /home/pbgl/Documents/CAD2-Project-2024/NC_058523.1_166630501-166635805.fa
Sample Options
  sample_sheet           : /home/pbgl/Documents/CAD2-Project-2024/sample-sheet-July23-six-amplicons
Variant Calling Options
  min_coverage           : 10
Output Options
  out_dir                : /home/pbgl/epi2melabs/instances/wf-amplicon_01J3MKHJ5SS8BK2J07J2KK8TBN/output
Advanced Options
  override_basecaller_cfg: dna_r9.4.1_450bps_hac
!! Only displaying parameters that differ from the pipeline defaults !!
--------------------------------------------------------------------------------
If you use epi2me-labs/wf-amplicon for your analysis please cite:
* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x
--------------------------------------------------------------------------------
This is epi2me-labs/wf-amplicon v1.1.1.
--------------------------------------------------------------------------------
Searching input for [.fastq, .fastq.gz, .fq, .fq.gz] files.
WARN: Overriding basecall model with 'dna_r9.4.1_450bps_hac'.
[f6/55218b] Submitted process > validate_sample_sheet
[86/fdbbf8] Submitted process > pipeline:getVersions
[b0/1bc930] Submitted process > pipeline:variantCallingPipeline:sanitizeRefFile
[7c/d471fc] Submitted process > pipeline:getParams
[2d/f437f3] Submitted process > fastcat (5)
[15/e1c15b] Submitted process > fastcat (1)
[55/325edf] Submitted process > pipeline:subsetReads (1)
[c0/808f7b] Submitted process > fastcat (3)
[e2/1e4ccf] Submitted process > fastcat (4)
[7d/1ccd58] Submitted process > pipeline:subsetReads (2)
[67/0f6db8] Submitted process > pipeline:subsetReads (3)
[b7/3af16c] Submitted process > fastcat (2)
[74/0c258e] Submitted process > fastcat (6)
[62/eb393a] Submitted process > pipeline:subsetReads (4)
[44/64cc6d] Submitted process > pipeline:subsetReads (5)
[bc/0c0fb9] Submitted process > pipeline:subsetReads (6)
[14/0ecb77] Submitted process > pipeline:porechop (1)
[ca/e25bcc] Submitted process > pipeline:porechop (2)
[82/f48a14] Submitted process > pipeline:porechop (3)
[bf/f4ee59] Submitted process > pipeline:porechop (4)
[38/a17246] Submitted process > pipeline:porechop (5)
[8c/8b40ef] Submitted process > pipeline:porechop (6)
[3f/262d22] Submitted process > pipeline:addMedakaToVersionsFile
[bd/dd57da] Submitted process > pipeline:variantCallingPipeline:alignReads (3)
[ee/a8b8cb] Submitted process > pipeline:variantCallingPipeline:alignReads (1)
[a0/773a2f] Submitted process > pipeline:variantCallingPipeline:alignReads (2)
[3d/3c187d] Submitted process > pipeline:variantCallingPipeline:alignReads (4)
[ed/352825] Submitted process > pipeline:variantCallingPipeline:bamstats (1)
[32/38b0ee] Submitted process > pipeline:variantCallingPipeline:mosdepth (1)
[72/736aa1] Submitted process > pipeline:variantCallingPipeline:bamstats (2)
[d2/fadffb] Submitted process > pipeline:variantCallingPipeline:mosdepth (2)
[cb/78ed03] Submitted process > pipeline:variantCallingPipeline:bamstats (3)
[04/7d9fdd] Submitted process > pipeline:variantCallingPipeline:mosdepth (3)
[08/3f2802] Submitted process > pipeline:variantCallingPipeline:mosdepth (4)
[94/d17039] Submitted process > pipeline:variantCallingPipeline:bamstats (4)
[93/b65a9e] Submitted process > pipeline:variantCallingPipeline:downsampleBAMforMedaka (1)
[ba/265a5e] Submitted process > pipeline:variantCallingPipeline:downsampleBAMforMedaka (2)
[cd/850d19] Submitted process > pipeline:variantCallingPipeline:downsampleBAMforMedaka (3)
[21/c8bed2] Submitted process > pipeline:variantCallingPipeline:concatMosdepthResultFiles (1)
[39/2cbd88] Submitted process > pipeline:variantCallingPipeline:concatMosdepthResultFiles (4)
[d1/b02db3] Submitted process > pipeline:variantCallingPipeline:concatMosdepthResultFiles (3)
[1c/49b271] Submitted process > pipeline:variantCallingPipeline:concatMosdepthResultFiles (2)
[20/b8dafa] Submitted process > pipeline:variantCallingPipeline:downsampleBAMforMedaka (4)
[a7/1431f1] Submitted process > pipeline:variantCallingPipeline:medakaConsensus (2)
[fb/4a657c] Submitted process > pipeline:variantCallingPipeline:medakaConsensus (1)
[b5/55c114] Submitted process > pipeline:variantCallingPipeline:medakaConsensus (3)
ERROR ~ Error executing process > 'pipeline:variantCallingPipeline:medakaConsensus (1)'
Caused by:
  Process `pipeline:variantCallingPipeline:medakaConsensus (1)` terminated with an error exit status (1)
Command executed:
  medaka consensus input.bam consensus_probs.hdf         --threads 2 --regions 'ref' --model dna_r9.4.1_450bps_hac:variant
Command exit status:
  1
Command output:
  (empty)
Command error:
  Cannot import pyabpoa, some features may not be available.
  Failed to interpret 'dna_r9.4.1_450bps_hac:variant' as a basecaller model.
  Traceback (most recent call last):
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/medaka.py", line 36, in __call__
      model_fp = medaka.models.resolve_model(val)
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/models.py", line 46, in resolve_model
      raise ValueError(
  ValueError: Model dna_r9.4.1_450bps_hac:variant is not a known model or existant file.

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/home/epi2melabs/conda/bin/medaka", line 8, in <module>
      sys.exit(main())
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/medaka.py", line 832, in main
      args = parser.parse_args()
    File "/home/epi2melabs/conda/lib/python3.8/argparse.py", line 1768, in parse_args
      args, argv = self.parse_known_args(args, namespace)
    File "/home/epi2melabs/conda/lib/python3.8/argparse.py", line 1800, in parse_known_args
      namespace, args = self._parse_known_args(args, namespace)
    File "/home/epi2melabs/conda/lib/python3.8/argparse.py", line 1988, in _parse_known_args
      positionals_end_index = consume_positionals(start_index)
    File "/home/epi2melabs/conda/lib/python3.8/argparse.py", line 1965, in consume_positionals
      take_action(action, args)
    File "/home/epi2melabs/conda/lib/python3.8/argparse.py", line 1874, in take_action
      action(self, namespace, argument_values, option_string)
    File "/home/epi2melabs/conda/lib/python3.8/argparse.py", line 1159, in __call__
      subnamespace, arg_strings = parser.parse_known_args(arg_strings, None)
    File "/home/epi2melabs/conda/lib/python3.8/argparse.py", line 1800, in parse_known_args
      namespace, args = self._parse_known_args(args, namespace)
    File "/home/epi2melabs/conda/lib/python3.8/argparse.py", line 2006, in _parse_known_args
      start_index = consume_optional(start_index)
    File "/home/epi2melabs/conda/lib/python3.8/argparse.py", line 1946, in consume_optional
      take_action(action, args, option_string)
    File "/home/epi2melabs/conda/lib/python3.8/argparse.py", line 1874, in take_action
      action(self, namespace, argument_values, option_string)
    File "/home/epi2melabs/conda/lib/python3.8/site-packages/medaka/medaka.py", line 39, in __call__
      raise RuntimeError(msg.format(self.dest, str(e)))
  RuntimeError: Error validating model from '--model' argument: Model dna_r9.4.1_450bps_hac:variant is not a known model or existant file..
Work dir:
  /home/pbgl/epi2melabs/instances/wf-amplicon_01J3MKHJ5SS8BK2J07J2KK8TBN/work/fb/4a657c4ae3abce51c43eb925bfd29d
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
 -- Check '/home/pbgl/epi2melabs/instances/wf-amplicon_01J3MKHJ5SS8BK2J07J2KK8TBN/nextflow.log' file for details
WARN: Killing running tasks (3)

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

No response

julibeg commented 4 months ago

Hi @warthmann,

The basecaller configuration also needs the model version. For Guppy v6.5.7, the dna_r9.4.1_450bps_hac.cfg file lists the following:

dorado_model_path                   = dna_r9.4.1_e8_hac@v3.3

I.e., if you pass --override_basecaller_cfg dna_r9.4.1_e8_hac@v3.3 things should work.

I know that this is not very intuitive. Having to wrangle with the different basecall models should hopefully be a thing of the past now as for recently basecalled data our workflows can detect the relevant model automatically.

Please let us know if the above works or if you ran into any other issues.

warthmann commented 4 months ago

works! thanks.