epi2me-labs / wf-cnv

Other
9 stars 2 forks source link

[Bug]: alignment fails on samtools sort #1

Closed rdeborja closed 1 year ago

rdeborja commented 1 year ago

What happened?

Issue:

In process alignment, error identified as:

Command error:
  samtools sort: failed to read header from "-"

Troubleshooting

Ran wf-cnv according to README:

nextflow run epi2me-labs/wf-cnv -r v0.0.1  --fastq /path/to/wf-cnv/wf-cnv-0.0.1/test_data/fastq --sample_sheet /path/to/wf-cnv/wf-cnv-0.0.1/test_data/sample_sheet.csv --fasta /path/to/references/GRCh38_no_alt_analysis_set.GCA_000001405.15.fna --genome hg38 --bin_size 500 --disable_ping -profile conda

Running .command.sh from the command line manually successfully executes the alignment process shell script.

Operating System

ubuntu 18.04

Workflow Execution

Command line

Workflow Execution - EPI2ME Labs Versions

No response

Workflow Execution - Execution Profile

Conda

Workflow Version

v0.0.1

Relevant log output

Error executing process > 'pipeline:alignment (3)'

Caused by:
  Process `pipeline:alignment (3)` terminated with an error exit status (1)

Command executed:

  minimap2 -ax map-ont GRCh38_no_alt_analysis_set.GCA_000001405.15.fna NA03225.fastq.gz | samtools sort -o NA03225.bam
  samtools index NA03225.bam

Command exit status:
  1

Command output:
  (empty)

Command error:
  samtools sort: failed to read header from "-"
mattdmem commented 1 year ago

Hi @rdeborja

I can reproduce your error, but we aren't sure what is causing it as it runs fine during internal testing and on some of my colleagues machines.

We'll hopefully have it figured out soon!

Matt

rdeborja commented 1 year ago

Thanks @mattdmem

In case it's useful, I'm running the following versions of nextflow:

$ nextflow -version

      N E X T F L O W
      version 22.04.5 build 5708

and java:

$ java -version
java version "11.0.13" 2021-10-19 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.13+10-LTS-370)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.13+10-LTS-370, mixed mode)
mattdmem commented 1 year ago

Thanks!

The issue we think is due to memory. We haven't set a CPU limit for the alignment process so it tries to run many at once which can cause memory issues.

2 things - you can wait for the next release which should be soon - or you can make the current main.nf look like this:

process alignment {
  label params.process_label
  cpus 4

  publishDir "${params.out_dir}/BAM", mode: 'copy', pattern: "*"

  input:
    tuple val(sample_id), val(type), file(fastq)
    file reference

  output:
    tuple val(sample_id), val(type), path("${sample_id}.bam"), path("${sample_id}.bam.bai")

  """
  minimap2 -t ${task.cpus} -ax map-ont ${reference} ${fastq} | samtools sort -o ${sample_id}.bam
  samtools index ${sample_id}.bam
  """
}

Here a cpu limit has been set and minimap 2 set to use that limit.

We are also not supporting conda with this workflow as it's complex to try and install R packages, we may add this support in the future, so even if this issue isn't there the workflow is likely to fail at the CNV step. We're preventing the use of conda in the up coming release.

You can use docker or singularity.

Thanks

Matt

rdeborja commented 1 year ago

Thanks @mattdmem. After updating main.nf and running via singularity I can confirm the pipeline successfully completes on the test data.

mattdmem commented 1 year ago

FYI v0.0.2 is released which introduces a --map_threads parameter to solve the above issue

Matt