epi2me-labs / wf-amplicon

Other
16 stars 5 forks source link

mosdepth process error when reference sequence description line includes tab #9

Closed MattArran closed 4 months ago

MattArran commented 5 months ago

Operating System

Ubuntu 22.04

Other Linux

No response

Workflow Version

v1.0.2-g8314146

Workflow Execution

Command line

EPI2ME Version

No response

CLI command run

Successful command: nextflow run epi2me-labs/wf-amplicon --fastq fastq --reference reference.fasta --threads 4 -profile singularity Unsuccessful command: nextflow run epi2me-labs/wf-amplicon --fastq fastq --reference reference_tab.fasta --threads 4 -profile singularity

Inputs fastq, reference.fasta and reference_tab.fasta included in zipped file inputs.zip.

Workflow Execution - CLI Execution Profile

singularity

What happened?

The workflow runs successfully with the input data in the wf-amplicon workflow's test_data folder, but fails when text is added after a tab character to one of the reference sequence's description line, with the log output below.

The tab and the text after it are included in the variable ref_id in modules/local/common.nf, which specifies the value of REF_ID in the relevant mosdepth process' command.sh script (REF_ID="katG__NC_000962.3_2154725-2155670 testing,_testing,_testing"). But the tab and text are not then included in the corresponding idxstats file /users/matarran/debug/work/7d/2f653fbd314ef0d897cdc068151da1/idxstats (contents copied below), within which $REF_ID is sought. I imagine the easiest solution is to add tabs to the list of special characters that are replaced by underscores by the process sanitizeRefFile of modules/local/variant-calling.nf.

idxstats file contents:

katG__NC_000962.3_2154725-215567    945 64  0
rpoB__NC_000962.3_760285-761376 1091    59  0
*   0   0   23

Relevant log output

N E X T F L O W  ~  version 23.04.3
Launching `https://github.com/epi2me-labs/wf-amplicon` [festering_watson] DSL2 - revision: 83141469c4 [master]

||||||||||   _____ ____ ___ ____  __  __ _____      _       _
||||||||||  | ____|  _ \_ _|___ \|  \/  | ____|    | | __ _| |__  ___
|||||       |  _| | |_) | |  __) | |\/| |  _| _____| |/ _` | '_ \/ __|
|||||       | |___|  __/| | / __/| |  | | |__|_____| | (_| | |_) \__ \
||||||||||  |_____|_|  |___|_____|_|  |_|_____|    |_|\__,_|_.__/|___/
||||||||||  wf-amplicon v1.0.2-g8314146
--------------------------------------------------------------------------------
Core Nextflow options
  revision       : master
  runName        : festering_watson
  containerEngine: singularity
  launchDir      : /users/matarran/debug
  workDir        : /users/matarran/debug/work
  projectDir     : /users/matarran/.nextflow/assets/epi2me-labs/wf-amplicon
  userName       : matarran
  profile        : singularity
  configFiles    : /users/matarran/.nextflow/assets/epi2me-labs/wf-amplicon/nextflow.config

Input Options
  fastq          : fastq
  reference      : reference_tab.fasta

!! Only displaying parameters that differ from the pipeline defaults !!
--------------------------------------------------------------------------------
If you use epi2me-labs/wf-amplicon for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

--------------------------------------------------------------------------------
This is epi2me-labs/wf-amplicon v1.0.2-g8314146.
--------------------------------------------------------------------------------
Searching input for [.fastq, .fastq.gz, .fq, .fq.gz] files.
executor >  local (18)
[06/3eebf3] process > fastcat (1)                                                [100%] 2 of 2 ✔
[8b/7979ff] process > pipeline:getVersions                                       [100%] 1 of 1 ✔
[cb/131037] process > pipeline:addMedakaToVersionsFile                           [100%] 1 of 1 ✔
[63/442577] process > pipeline:getParams                                         [100%] 1 of 1 ✔
[84/bf28d5] process > pipeline:subsetReads (2)                                   [100%] 2 of 2 ✔
[8b/f5ba98] process > pipeline:porechop (2)                                      [100%] 2 of 2 ✔
[4a/b6e46d] process > pipeline:variantCallingPipeline:lookupMedakaModel          [100%] 1 of 1 ✔
[db/c91fb1] process > pipeline:variantCallingPipeline:sanitizeRefFile            [100%] 1 of 1 ✔
[-        ] process > pipeline:variantCallingPipeline:subsetRefFile              -
[84/b9f3c0] process > pipeline:variantCallingPipeline:alignReads (1)             [100%] 2 of 2 ✔
[b4/74f5d4] process > pipeline:variantCallingPipeline:bamstats (2)               [100%] 2 of 2 ✔
[44/84d0a3] process > pipeline:variantCallingPipeline:downsampleBAMforMedaka (1) [  0%] 0 of 2
executor >  local (18)
[06/3eebf3] process > fastcat (1)                                                [100%] 2 of 2 ✔
[8b/7979ff] process > pipeline:getVersions                                       [100%] 1 of 1 ✔
[cb/131037] process > pipeline:addMedakaToVersionsFile                           [100%] 1 of 1 ✔
[63/442577] process > pipeline:getParams                                         [100%] 1 of 1 ✔
[84/bf28d5] process > pipeline:subsetReads (2)                                   [100%] 2 of 2 ✔
[8b/f5ba98] process > pipeline:porechop (2)                                      [100%] 2 of 2 ✔
[4a/b6e46d] process > pipeline:variantCallingPipeline:lookupMedakaModel          [100%] 1 of 1 ✔
[db/c91fb1] process > pipeline:variantCallingPipeline:sanitizeRefFile            [100%] 1 of 1 ✔
[-        ] process > pipeline:variantCallingPipeline:subsetRefFile              -
[84/b9f3c0] process > pipeline:variantCallingPipeline:alignReads (1)             [100%] 2 of 2 ✔
[b4/74f5d4] process > pipeline:variantCallingPipeline:bamstats (2)               [100%] 2 of 2 ✔
[44/84d0a3] process > pipeline:variantCallingPipeline:downsampleBAMforMedaka (1) [  0%] 0 of 2
[-        ] process > pipeline:variantCallingPipeline:medakaConsensus            -
[-        ] process > pipeline:variantCallingPipeline:medakaVariant              -
[7d/2f653f] process > pipeline:variantCallingPipeline:mosdepth (1)               [ 25%] 1 of 4, failed: 1
[-        ] process > pipeline:variantCallingPipeline:concatMosdepthResultFiles  -
[-        ] process > pipeline:collectFilesInDir                                 -
[-        ] process > pipeline:makeReport                                        -
[-        ] process > output                                                     -
ERROR ~ Error executing process > 'pipeline:variantCallingPipeline:mosdepth (1)'

Caused by:
  Process `pipeline:variantCallingPipeline:mosdepth (1)` terminated with an error exit status (1)

Command executed:

  REF_ID="katG__NC_000962.3_2154725-2155670     testing,_testing,_testing"
  # get ref IDs and lengths with `samtools idxstats`
  samtools idxstats input.bam > idxstats

  # if `ref_id` was `null`, assume that there was only one reference and look up its
  # ID from the idxstats
  if [ -z "$REF_ID" ]; then
      REF_ID=$(head -n1 idxstats | cut -f1)
      if [[ $REF_ID = '*' ]]; then
          echo "QUITTING: Only unmapped reads in 'input.bam'."
          exit 0
      fi
  fi

  # get the length of the reference
  REF_LENGTH=$(grep -w "$REF_ID" idxstats | cut -f2)

  # calculate the corresponding window length (check `REF_LENGTH` first because
  # `expr a / b` returns non-zero exit code when `a < b`)
  window_length=1
  if [ "$REF_LENGTH" -gt "100" ]; then
      window_length=$(expr $REF_LENGTH / 100)
  fi

  # get the depths (we could add `-x`, but this loses a lot of detail from the depth
  # curves)
  mosdepth -t 2 -b $window_length -n -c "$REF_ID" depth input.bam

Command exit status:
  1

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  INFO:    gocryptfs not found, will not be able to use gocryptfs

Work dir:
  /users/matarran/debug/work/7d/2f653fbd314ef0d897cdc068151da1

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details
WARN: Killing running tasks (2)

Application activity log entry

No response

MattArran commented 5 months ago

A similar issue's encountered if there are the special characters ", $, or \ in a description line, so these should also be replaced. The character * is already replaced, and the workflow can handle the other special characters ^, &, and ..

julibeg commented 5 months ago

Hi @MattArran!

Thanks for raising this and providing the extra information. This will be fixed in the next release.

julibeg commented 4 months ago

This was fixed in the most recent version (v1.0.4). Please re-open this issue if the problem persists.