epi2me-labs / wf-somatic-variation

Other
12 stars 6 forks source link

bamstats error with "implausible alignment information" #31

Closed lucy924 closed 2 weeks ago

lucy924 commented 1 month ago

Operating System

Ubuntu 22.04

Other Linux

No response

Workflow Version

v1.3.0

Workflow Execution

Command line (Cluster)

Other workflow execution

No response

EPI2ME Version

No response

CLI command run

nextflow run epi2me-labs/wf-somatic-variation \
    --mod \
    --sample_name 'testing_B3_CvT_chr1' \
     --ref '/projects/uow/GERL/dejlu879/refs/hg38_chromsonly_ebv_lambda_bcg/chr1.fa' \
    --bam_normal '/projects/uow/GERL/dejlu879/BEBIC_analysis/CellLineB/demuxed/chr1/B3_02_Control.bam' \
    --bam_tumor '/projects/uow/GERL/dejlu879/BEBIC_analysis/CellLineB/demuxed/chr1/B3_03_Test.bam' \
    --normal_min_coverage 15 \
    --tumor_min_coverage 15 \
    -profile singularity

Workflow Execution - CLI Execution Profile

singularity

What happened?

The workflow stopped at a bamstats command, telling me that a read "appears to contain implausible alignment information". My understanding was that it performed the alignment as part of the workflow? I have run the wf-alignment on these files previously and didn't have any issues with it. I tried to run this bamstats command outside the workflow but it didn't recognise the -s option, and I couldn't specify a different version of bamstats through bioconda. I am running this on a cluster using slurm. Nextflow gave a warning that it doesn't match the required version, but I didn't think that would be why I have this particular error message?

Relevant log output

--------------------------------------------------------------------------------
Core Nextflow options
  revision           : master
  runName            : nice_khorana
  containerEngine    : singularity
  container          : ontresearch/wf-somatic-variation:sha18cc2ea1fae27fc772e7b728957996119c1ec81a
  launchDir          : /projects/uow/GERL/dejlu879/run_wf-variation/testing_B3_CvT_chr1
  workDir            : /projects/uow/GERL/dejlu879/run_wf-variation/testing_B3_CvT_chr1/work
  projectDir         : /home/dejlu879/.nextflow/assets/epi2me-labs/wf-somatic-variation
  userName           : dejlu879
  profile            : singularity
  configFiles        : /home/dejlu879/.nextflow/config, /home/dejlu879/.nextflow/assets/epi2me-labs/wf-somatic-variation/nextflow.config

Workflow Options
  mod                : true

Main options
  sample_name        : testing_B3_CvT_chr1
  bam_normal         : /projects/uow/GERL/dejlu879/BEBIC_analysis/CellLineB/demuxed/chr1/B3_02_Control.bam
  bam_tumor          : /projects/uow/GERL/dejlu879/BEBIC_analysis/CellLineB/demuxed/chr1/B3_03_Test.bam
  ref                : /projects/uow/GERL/dejlu879/refs/hg38_chromsonly_ebv_lambda_bcg/chr1.fa

Quality Control Options
  tumor_min_coverage : 15
  normal_min_coverage: 15

!! Only displaying parameters that differ from the pipeline defaults !!
--------------------------------------------------------------------------------
...
--------------------------------------------------------------------------------
WARN: Nextflow version 22.10.4 does not match workflow required version: >=23.04.2 -- Execution will continue, but things may break!
--------------------------------------------------------------------------------
...
--------------------------------------------------------------------------------
Error executing process > 'alignment_stats:bamstats (1)'

Caused by:
  Process `alignment_stats:bamstats (1)` terminated with an error exit status (1)

Command executed:

  bamstats testing_B3_CvT_chr1.cram \
      -s testing_B3_CvT_chr1 \
      --threads 3 \
      -i "testing_B3_CvT_chr1.tumor.per-file-runids.txt" \
      -l "testing_B3_CvT_chr1.tumor.basecallers.tsv" \
      -u \
      --histograms hists_testing_B3_CvT_chr1_tumor \
      -f testing_B3_CvT_chr1_tumor.flagstat.tsv \
      | gzip > "testing_B3_CvT_chr1_tumor.readstats.tsv.gz"

  # get unique run IDs
  awk -F '\t' '
      NR==1 {for (i=1; i<=NF; i++) {ix[$i] = i}}
      # only print run_id if present
      NR>1 && $ix["run_id"] != "" {print $ix["run_id"]}
  ' testing_B3_CvT_chr1.tumor.per-file-runids.txt | sort | uniq > testing_B3_CvT_chr1.tumor.runids.txt
  # get unique basecall models
  awk -F '\t' '
      NR==1 {for (i=1; i<=NF; i++) {ix[$i] = i}}
      # only print basecall model if present
      NR>1 && $ix["basecaller"] != "" {print $ix["basecaller"]}
  ' testing_B3_CvT_chr1.tumor.basecallers.tsv | sort | uniq > testing_B3_CvT_chr1.tumor.basecallers.txt

Command exit status:
  1

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  INFO:    underlay of /etc/localtime required more than 50 (78) bind mounts
  Read 'f78b0e42-73ec-4b73-a269-49e56618e48b' appears to contain implausible alignment information

Work dir:
  /projects/uow/GERL/dejlu879/run_wf-variation/testing_B3_CvT_chr1/work/c8/f18a7d39453844f4e8bf8070c12b12

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

No response

RenzoTale88 commented 4 weeks ago

@lucy924 we just released v1.3.1 that should address this issue. Could you please try it and let us know if it works?

Andrea

lucy924 commented 3 weeks ago

Hi Andrea, Thanks for the update, unfortunately I'm having issues accessing our cluster at the moment and I'll be away next week, just wanted to let you know I'll get to it when I'm back. Thanks, Lucy

lucy924 commented 2 weeks ago

Hi Andrea,

I'm trying to run the workflow but I'm running into a memory error, not sure if it's related or not, or if I should open a new issue. Our IT engineer said I may need to increase memory for this particular task either in the workflow config or in the global nextflow config, but I don't know too much about nextflow yet and cautious about messing with your files too much.

The error output is attached, the problem command is:

probs=$( modkit sample-probs B3_CvT_chr1.cram -p 0.1 --interval-size 5000000 --only-mapped --threads 4 2> /dev/null | awk 'NR>1 {ORS=" "; print "--filter-threshold "$1":"$3}' )

The .command.log output is this:

INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
INFO:    underlay of /etc/localtime required more than 50 (78) bind mounts
slurmstepd: error: Detected 1 oom_kill event in StepId=811821.batch. Some of the step tasks have been OOM Killed.

error_output.nextflow.log

RenzoTale88 commented 2 weeks ago

Yeah this is similar to what has been notified recently for wf-human-varaition here. You can apply the same solution here, i.e. by passing a custom configuration file.

First, save the following text in a file named e.g. custom.config (it can be any name):

process {
  withName: sample_probs {
    memory 32.GB
  }
}

Then, you can pass it to the workflow with -c custom.config.

I gather that the implausible alignment issue is sorted?

lucy924 commented 2 weeks ago

Thanks for the config help, that did indeed sort it. I had to add another process to it too but it's good to know that workaround for when it's out of memory. The workflow has now successfully gone past this original bamstats problem,thank you for fixing it! Lucy

RenzoTale88 commented 2 weeks ago

Great closing this issue then.

Andrea