epi2me-labs / wf-human-variation

Other
86 stars 41 forks source link

snpeff fails with segfault #164

Open trum994 opened 3 months ago

trum994 commented 3 months ago

Operating System

Other Linux (please specify below)

Other Linux

Ubuntu

Workflow Version

2.0.0

Workflow Execution

Command line

EPI2ME Version

No response

CLI command run

!/bin/bash

ml nextflow export NXF_OFFLINE="false" nextflow run epi2me-labs/wf-human-variation \ -latest \ -resume \ --bam '.../test_epi2me_hv/bam/R8967_800_v2.count.bam' \ --basecaller_cfg 'dna_r10.4.1_e8.2_400bps_hac_prom' \ --mod \ --ref '.../test_epi2me_hv/refs/Homo_sapiens.GRCh38.dna.primary_assembly.fa' \ --sample_name 'R8967' \ --snp \ --sv \ --bam_min_coverage 0 \ -profile singularity \ -c myconfig.txt

Workflow Execution - CLI Execution Profile

singularity

What happened?

We're running on our cluster via singularity. annotate snpeff step fails for me with a java sigbus error. In the execution directory .command.log and .command.err only have this: Picked up JAVA_TOOL_OPTIONS: -Xlog:disable -Xlog:all=warning:stderr But there's also a hs_err_pid201.log file which I pasted in the "Application activity log entry".

During a resume it will fail within seconds. However if I cd into the execution directory and simply do a sbatch .command.run this step will run to completion with no issues.

Relevant log output

N E X T F L O W  ~  version 23.04.4
Pulling epi2me-labs/wf-human-variation ...
 Already-up-to-date
Launching `https://github.com/epi2me-labs/wf-human-variation` [disturbed_noether] DSL2 - revision: 52e3698431 [master]

||||||||||   _____ ____ ___ ____  __  __ _____      _       _
||||||||||  | ____|  _ \_ _|___ \|  \/  | ____|    | | __ _| |__  ___
|||||       |  _| | |_) | |  __) | |\/| |  _| _____| |/ _` | '_ \/ __|
|||||       | |___|  __/| | / __/| |  | | |__|_____| | (_| | |_) \__ \
||||||||||  |_____|_|  |___|_____|_|  |_|_____|    |_|\__,_|_.__/|___/
||||||||||  wf-human-variation v2.0.0-g52e3698
--------------------------------------------------------------------------------
Core Nextflow options
  revision        : master
  runName         : disturbed_noether
  containerEngine : singularity
  container       : ontresearch/wf-human-variation:shad3aed855cd007c653b8fc8cb16fe46c90199990f
  launchDir       : .../kurdogla/test_epi2me_hv
  workDir         : .../kurdogla/nxf_work
  projectDir      : .../.nextflow/assets/epi2me-labs/wf-human-variation
  userName        : kurdogla
  profile         : singularity
  configFiles     : .../.nextflow/assets/epi2me-labs/wf-human-variation/nextflow.config, .../test_epi2me_hv/myconfig.txt

Workflow Options
  sv              : true
  snp             : true
  mod             : true

Main options
  sample_name     : R8967
  bam             : .../test_epi2me_hv/bam/R8967_800_v2.count.bam
  ref             : .../test_epi2me_hv/refs/Homo_sapiens.GRCh38.dna.primary_assembly.fa
  basecaller_cfg  : dna_r10.4.1_e8.2_400bps_hac_prom
  bam_min_coverage: 0

!! Only displaying parameters that differ from the pipeline defaults !!
--------------------------------------------------------------------------------
If you use epi2me-labs/wf-human-variation for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

--------------------------------------------------------------------------------
This is epi2me-labs/wf-human-variation v2.0.0-g52e3698.
--------------------------------------------------------------------------------
...
[af/08c4e9] process > configure_jbrowse (1)          [100%] 1 of 1, cached: 1 ✔
[ea/165979] process > publish_artifact (14)          [100%] 14 of 14, cached:...
Autoselected Clair3 model: r1041_e82_400bps_hac_g632
ERROR ~ Error executing process > 'annotate_snp_vcf (7)'

Caused by:
  Process `annotate_snp_vcf (7)` terminated with an error exit status (250)

Command executed:

  if [ "7" == '*' ]; then
      # SV is quick to annotate, dont bother breaking it apart
      INPUT_FILENAME=input.vcf.gz
      OUTPUT_LABEL="snp"
  else
      # SNP is slow to annotate, we'll break it apart by contig
      # and merge it back later. filter the master VCF to current contig
      bcftools view -r 7 input.vcf.gz | bgzip > input.chr.vcf.gz
      INPUT_FILENAME=input.chr.vcf.gz
      OUTPUT_LABEL="snp.7"
  fi

  # deal with samples which aren't hg19 or hg38
  if [[ "hg38" != "hg38" ]] && [[ "hg38" != "hg19" ]]; then
      # return the original VCF and index as the outputs
      cp ${INPUT_FILENAME} R8967.wf_${OUTPUT_LABEL}.vcf.gz
      cp ${INPUT_FILENAME}.tbi R8967.wf_${OUTPUT_LABEL}.vcf.gz.tbi
  else
      # do some annotation
      if [[ "hg38" == "hg38" ]]; then
          snpeff_db="GRCh38.p13"
          clinvar_vcf="${CLINVAR_PATH}/clinvar_GRCh38.vcf.gz"

      elif [[ "hg38" == "hg19" ]]; then
          snpeff_db="GRCh37.p13"
          clinvar_vcf="${CLINVAR_PATH}/clinvar_GRCh37.vcf.gz"
      fi

      snpEff -Xmx47g ann -noStats -noLog $snpeff_db ${INPUT_FILENAME} > R8967.intermediate.snpeff_annotated.vcf
      # Add ClinVar annotations
      SnpSift annotate $clinvar_vcf R8967.intermediate.snpeff_annotated.vcf | bgzip > R8967.wf_${OUTPUT_LABEL}.vcf.gz
      tabix R8967.wf_${OUTPUT_LABEL}.vcf.gz

      # tidy up
      rm R8967.intermediate*
  fi

Command exit status:
  250

Command output:
  (empty)

Command error:
  Picked up JAVA_TOOL_OPTIONS: -Xlog:disable -Xlog:all=warning:stderr

Work dir:
  .../nxf_work/f9/1d942c510ce9a4ca0ce47091f08d3b

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details
WARN: Killing running tasks (23)

Application activity log entry

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00002aaaab9c645e, pid=201, tid=202
#
# JRE version:  (21.0.2) (build )
# Java VM: OpenJDK 64-Bit Server VM (21.0.2-internal-adhoc.conda.src, mixed mode, sharing, tiered, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xc9b45e]  PerfMemory::alloc(unsigned long)+0x5e
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
#

---------------  S U M M A R Y ------------

Command Line: -Xlog:disable -Xlog:all=warning:stderr -Xmx47g /home/epi2melabs/conda/share/snpeff-5.1-2/snpEff.jar ann -noStats -noLog GRCh38.p13 input.chr.vcf.gz

Host: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 28 cores, 251G, Ubuntu 20.04.6 LTS
Time: Thu Mar 14 23:46:53 2024 PDT elapsed time: 0.212681 seconds (0d 0h 0m 0s)

---------------  T H R E A D  ---------------

Current thread (0x00002aaab0016b70):  JavaThread "Unknown thread" [_thread_in_vm, id=202, stack(0x00002aaaac492000,0x00002aaaac593000) (1028K)]

Stack: [0x00002aaaac492000,0x00002aaaac593000],  sp=0x00002aaaac5919d0,  free space=1022k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xc9b45e]  PerfMemory::alloc(unsigned long)+0x5e
V  [libjvm.so+0xc9a232]  PerfData::create_entry(BasicType, unsigned long, unsigned long)+0x72
V  [libjvm.so+0xc9ae0c]  PerfDataManager::create_string_variable(CounterNS, char const*, int, char const*, JavaThread*)+0x14c

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

demo pass
SamStudio8 commented 3 months ago

@trum994 It looks as though you are requesting a 48 GB of memory for the snpEff annotation step, which cannot be allocated. The workflow tries to send sensible defaults for all processes and additional memory for snpEff beyond the default has not brought much benefit in our testing. You should reduce (or remove) the directive that overrides the memory for snpeff in your .../test_epi2me_hv/myconfig.txt file.

trum994 commented 3 months ago

Thanks for your quick response. I will remove snpeff mem override in myconfig.txt and resume. However our cluster did not have any issues allocating 48GB.

trum994 commented 3 months ago

Unfortunately same issue:

cat R8967.intermediate.snpeff_annotated.vcf 
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00002aaaab9c645e, pid=119, tid=164
#
# JRE version:  (21.0.2) (build )
# Java VM: OpenJDK 64-Bit Server VM (21.0.2-internal-adhoc.conda.src, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, serial gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xc9b45e]  PerfMemory::alloc(unsigned long)+0x5e
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# .../nxf_work/89/ed3a3fd343527c8d5d39062582885a/hs_err_pid119.log
#
#

And here is the script with now 6GB.

cat .command.sh 
#!/bin/bash -euo pipefail
if [ "X" == '*' ]; then
    # SV is quick to annotate, dont bother breaking it apart
    INPUT_FILENAME=input.vcf.gz
    OUTPUT_LABEL="snp"
else
    # SNP is slow to annotate, we'll break it apart by contig
    # and merge it back later. filter the master VCF to current contig
    bcftools view -r X input.vcf.gz | bgzip > input.chr.vcf.gz
    INPUT_FILENAME=input.chr.vcf.gz
    OUTPUT_LABEL="snp.X"
fi

# deal with samples which aren't hg19 or hg38
if [[ "hg38" != "hg38" ]] && [[ "hg38" != "hg19" ]]; then
    # return the original VCF and index as the outputs
    cp ${INPUT_FILENAME} R8967.wf_${OUTPUT_LABEL}.vcf.gz
    cp ${INPUT_FILENAME}.tbi R8967.wf_${OUTPUT_LABEL}.vcf.gz.tbi
else
    # do some annotation
    if [[ "hg38" == "hg38" ]]; then
        snpeff_db="GRCh38.p13"
        clinvar_vcf="${CLINVAR_PATH}/clinvar_GRCh38.vcf.gz"

    elif [[ "hg38" == "hg19" ]]; then
        snpeff_db="GRCh37.p13"
        clinvar_vcf="${CLINVAR_PATH}/clinvar_GRCh37.vcf.gz"
    fi

    snpEff -Xmx5g ann -noStats -noLog $snpeff_db ${INPUT_FILENAME} > R8967.intermediate.snpeff_annotated.vcf
    # Add ClinVar annotations
    SnpSift annotate $clinvar_vcf R8967.intermediate.snpeff_annotated.vcf | bgzip > R8967.wf_${OUTPUT_LABEL}.vcf.gz
    tabix R8967.wf_${OUTPUT_LABEL}.vcf.gz

    # tidy up
    rm R8967.intermediate*
fi
SamStudio8 commented 3 months ago

That is unfortunate! I assume the hs_err_pid119.log file is similar to the first one? A search indicates that this error could possibly be related to multiple JVMs accessing the same performance instrumentation file. Could you try adding the following to your custom configuration:

process {
    withName:annotate_vcf {
        maxForks = 1
    }
}

This will force the SnpEff steps to run in serial which will mitigate this error (if that is the issue).

trum994 commented 3 months ago

With maxForks =1 added my snpeff hasn't crashed yet! This would also explain why the job had no issues when I manually kicked off .command.run. Of course as expected I only see 1 job running right now so this will take a while.

SamStudio8 commented 3 months ago

Thanks, let me know how you get on!

trum994 commented 3 months ago

[41/5fd95a] process > annotate_snp_vcf (24) [100%] 25 of 25 ✔ Complete! Now is there a way to fix this bug or I am stuck with maxForks = 1 ? Either way thanks for this workaround!