epi2me-labs / wf-human-variation

Other
86 stars 41 forks source link

minimap2 alignment step fails because of samtools sort #145

Open trum994 opened 4 months ago

trum994 commented 4 months ago

Ask away!

I'm using this workflow on our HPC via Singularity. The input bam is a Promethion run mini bams merged via samtools merge into a single bam (140GB). Is this a missing header issue or a RAM issue or something else?

This is the output I get:

ERROR ~ Error executing process > 'bam_ingress:minimap2_alignment (1)'

Caused by: Process bam_ingress:minimap2_alignment (1) terminated with an error exit status (1)

Command executed:

samtools bam2fq -@ 1 -T 1 R8967.bam | minimap2 -y -t 8 -ax map-ont Homo_sapiens.GRCh38.dna.primary_assembly.fa - | samtools sort -@ 3 --write-index -o R8967.cram##idx##R8967.cram.crai -O cram --reference Homo_sapiens.GRCh38.dna.primary_assembly.fa -

Command exit status: 1

Command output: (empty)

Command error: [M::mm_idx_gen::84.6931.70] collected minimizers [M::mm_idx_gen::93.0432.21] sorted minimizers [M::main::93.0432.21] loaded/built the index for 194 target sequence(s) [M::mm_mapopt_update::95.4572.18] mid_occ = 705 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 194 [M::mm_idx_stat::96.375*2.17] distinct minimizers: 100159079 (38.79% are singletons); average occurrences: 5.540; average spacing: 5.586; total length: 3099750718 [W::sam_hdr_create] Ignored @SQ SN:KI270330 : bad or missing LN tag [E::sam_hrecs_update_hashes] Header includes @SQ line "KI270330" with no LN: tag [E::sam_hrecs_update_hashes] Header includes @SQ line "KI270330" with no LN: tag samtools sort: failed to change sort order header to 'SO:coordinate'

SamStudio8 commented 4 months ago

Hi @trum994, it is quite likely that the workflow has exceeded its memory limit for the alignment step. Can you confirm that you're using the latest version as we've made several recent improvements to the memory directives and performance generally.

stfacc commented 3 months ago

I'm getting a similar error with the gencode hg38 reference.

I'm using the current "master" version of the pipeline.

ERROR ~ Error executing process > 'bam_ingress:minimap2_alignment (1)'

Caused by:
  Process `bam_ingress:minimap2_alignment (1)` terminated with an error exit status (1)

Command executed:

  samtools bam2fq -@ 1 -T 1  merged_pass.bam | minimap2 -y -t 8 -ax map-ont GRCh38.primary_assembly.genome.fa -     | samtools sort -@ 3 --write-index -o sample_180057.bam##idx##sample_180057.bam.bai -O bam --reference GRCh38.primary_assembly.genome.fa -

Command exit status:
  1

Command output:
  (empty)

Command error:
  [M::mm_idx_gen::74.061*1.88] collected minimizers
  [M::mm_idx_gen::82.639*2.49] sorted minimizers
  [M::main::82.639*2.49] loaded/built the index for 194 target sequence(s)
  [M::mm_mapopt_update::84.520*2.46] mid_occ = 706
  [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 194
  [M::mm_idx_stat::85.610*2.44] distinct minimizers: 100159079 (38.75% are singletons); average occurrences: 5.545; average spacing: 5.581; total length: 3099750718
  [W::sam_hdr_create] Ignored @SQ line with missing SN: tag
  [E::sam_hrecs_error] Malformed key:value pair at line 157: "@SQ       SN"
  [E::sam_hrecs_error] Malformed key:value pair at line 157: "@SQ       SN"
  samtools sort: failed to change sort order header to 'SO:coordinate'
LisaHagenau commented 3 months ago

I am also having workflows fail with this error. I could complete the workflow with the same settings for 2 of 3 samples, but one keeps failing and it is not even a large file (8 GB cram). I have 64 GB memory available and have explicitly allocated 60 GB to run this workflow, but in the final report, the process bam_ingress:minimap2_alignment only has 16 GB allocated.


N E X T F L O W  ~  version 23.04.2
Launching `/home/nanopore/data-hdd/epi2melabs/workflows/epi2me-labs/wf-human-variation/main.nf` [adoring_carson] DSL2 - revision: 03fcebc94c
WARN: Found unexpected parameters:
* --client_fields: /home/nanopore/data-hdd/epi2melabs/instances/wf-human-variation_01HRYP06TJJ6N6BYMKKAAX9T06/client_fields.json
- Ignore this warning: params.schema_ignore_params = "client_fields" 
||||||||||   _____ ____ ___ ____  __  __ _____      _       _
||||||||||  | ____|  _ \_ _|___ \|  \/  | ____|    | | __ _| |__  ___
|||||       |  _| | |_) | |  __) | |\/| |  _| _____| |/ _` | '_ \/ __|
|||||       | |___|  __/| | / __/| |  | | |__|_____| | (_| | |_) \__ \
||||||||||  |_____|_|  |___|_____|_|  |_|_____|    |_|\__,_|_.__/|___/
||||||||||  wf-human-variation v2.0.0
--------------------------------------------------------------------------------
Core Nextflow options
  runName         : adoring_carson
  containerEngine : docker
  container       : ontresearch/wf-human-variation:shad3aed855cd007c653b8fc8cb16fe46c90199990f
  launchDir       : /mnt/data-hdd/epi2melabs/instances/wf-human-variation_01HRYP06TJJ6N6BYMKKAAX9T06
  workDir         : /home/nanopore/data-hdd/epi2melabs/instances/wf-human-variation_01HRYP06TJJ6N6BYMKKAAX9T06/work
  projectDir      : /home/nanopore/data-hdd/epi2melabs/workflows/epi2me-labs/wf-human-variation
  userName        : nanopore
  profile         : standard
  configFiles     : /home/nanopore/data-hdd/epi2melabs/workflows/epi2me-labs/wf-human-variation/nextflow.config, /home/nanopore/data-hdd/epi2melabs/instances/wf-human-variation_01HRYP06TJJ6N6BYMKKAAX9T06/local.config, /home/nanopore/data-hdd/epi2melabs/instances/wf-human-variation_01HRYP06TJJ6N6BYMKKAAX9T06/global.config
Workflow Options
  mod             : true
Main options
  sample_name     : 2Gy
  bam             : /home/nanopore/data-hdd/epi2melabs/instances/wf-basecalling_01HRRV905NBW78W9CHMS7870WV/output/2Gy.pass.cram
  ref             : /home/nanopore/data-hdd/genomes/hg38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
  old_ref         : /home/nanopore/data-hdd/genomes/T2T_chm13_hg/chm13v2.0.fa
  basecaller_cfg  : dna_r10.4.1_e8.2_400bps_hac@v4.3.0
  bam_min_coverage: 2
  out_dir         : /home/nanopore/data-hdd/epi2melabs/instances/wf-human-variation_01HRYP06TJJ6N6BYMKKAAX9T06/output
!! Only displaying parameters that differ from the pipeline defaults !!
--------------------------------------------------------------------------------
If you use epi2me-labs/wf-human-variation for your analysis please cite:
* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x
--------------------------------------------------------------------------------
This is epi2me-labs/wf-human-variation v2.0.0.
--------------------------------------------------------------------------------
[8a/d5a790] Submitted process > cram_cache (1)
[f6/7655a2] Submitted process > getVersions
[bf/9b3bf9] Submitted process > index_ref_fai (1)
[2e/76b4ae] Submitted process > getParams
[3c/aee16c] Submitted process > publish_artifact (1)
[c3/9a1bc6] Submitted process > bam_ingress:check_for_alignment (1)
[b6/410aad] Submitted process > bam_ingress:minimap2_alignment (1)
[a7/2f5a43] Submitted process > publish_artifact (2)
[7c/4642e3] Submitted process > publish_artifact (4)
[31/d9f933] Submitted process > publish_artifact (3)
[97/0c1c3e] Submitted process > getAllChromosomesBed (1)
ERROR ~ Error executing process > 'bam_ingress:minimap2_alignment (1)'
Caused by:
  Process `bam_ingress:minimap2_alignment (1)` terminated with an error exit status (1)
Command executed:
  samtools bam2fq -@ 1 -T 1 --reference chm13v2.0.fa 2Gy.pass.cram | minimap2 -y -t 8 -ax map-ont GCA_000001405.15_GRCh38_no_alt_analysis_set.fna -     | samtools sort -@ 3 --write-index -o 2Gy.cram##idx##2Gy.cram.crai -O cram --reference GCA_000001405.15_GRCh38_no_alt_analysis_set.fna -
Command exit status:
  1
Command output:
  (empty)
Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  [M::mm_idx_gen::30.736*1.65] collected minimizers
  [M::mm_idx_gen::34.937*2.41] sorted minimizers
  [M::main::34.937*2.41] loaded/built the index for 195 target sequence(s)
  [M::mm_mapopt_update::35.965*2.37] mid_occ = 694
  [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 195
  [M::mm_idx_stat::36.486*2.35] distinct minimizers: 100167746 (38.80% are singletons); average occurrences: 5.519; average spacing: 5.607; total length: 3099922541
  [W::sam_hdr_create] Ignored @SQ SN:chrUn_K : bad or missing LN tag
  [E::sam_hrecs_update_hashes] Header includes @SQ line "chrUn_K" with no LN: tag
  [E::sam_hrecs_update_hashes] Header includes @SQ line "chrUn_K" with no LN: tag
  samtools sort: failed to change sort order header to 'SO:coordinate'
Work dir:
  /home/nanopore/data-hdd/epi2melabs/instances/wf-human-variation_01HRYP06TJJ6N6BYMKKAAX9T06/work/b6/410aadc15d15d2ba94b7949fdc4554
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
 -- Check '/home/nanopore/data-hdd/epi2melabs/instances/wf-human-variation_01HRYP06TJJ6N6BYMKKAAX9T06/nextflow.log' file for details
Execution cancelled -- Finishing pending tasks before exit
LisaHagenau commented 3 months ago

I was able to run this workflow today after restarting the computer, which emptied my swap. Before, htop showed the swap at 6 GB (of 8 GB total). Not sure what was causing the swap to fill up, only EPI2ME was running and after successful workflow completion the swap emptied again.

SamStudio8 commented 3 months ago

Thanks all for these reports. We have been investigating ways to curtail the memory usage of minimap for this step and will have a patch release out next week to try and alleviate the problems you've been having with the alignment step.

On Fri, 15 Mar 2024, 12:36 LisaHagenau, @.***> wrote:

I was able to run this workflow today after restarting the computer, which emptied my swap. Before, htop showed the swap at 6 GB (of 8 GB total). Not sure what was causing the swap to fill up, only EPI2ME was running and after successful workflow completion the swap emptied again.

— Reply to this email directly, view it on GitHub https://github.com/epi2me-labs/wf-human-variation/issues/145#issuecomment-1999574205, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIN6OTD2KI3FVWMPIJEZXTYYLTLXAVCNFSM6AAAAABDMZM7IGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJZGU3TIMRQGU . You are receiving this because you commented.Message ID: @.***>