trum994 commented 3 months ago

Operating System

Other Linux (please specify below)

Other Linux


Workflow Version


Workflow Execution

Command line

EPI2ME Version

CLI command run


ml nextflow export NXF_OFFLINE="false" nextflow run epi2me-labs/wf-human-variation \ -latest \ -resume \ --bam '.../test_epi2me_hv/bam/R8967_800_v2.count.bam' \ --basecaller_cfg 'dna_r10.4.1_e8.2_400bps_hac_prom' \ --mod \ --ref '.../test_epi2me_hv/refs/Homo_sapiens.GRCh38.dna.primary_assembly.fa' \ --sample_name 'R8967' \ --snp \ --sv \ --bam_min_coverage 0 \ -profile singularity \ -c myconfig.txt

Workflow Execution - CLI Execution Profile


What happened?

We're running on our cluster via singularity. annotate snpeff step fails for me with a java sigbus error. In the execution directory .command.log and .command.err only have this: Picked up JAVA_TOOL_OPTIONS: -Xlog:disable -Xlog:all=warning:stderr But there's also a hs_err_pid201.log file which I pasted in the "Application activity log entry".

During a resume it will fail within seconds. However if I cd into the execution directory and simply do a sbatch .command.run this step will run to completion with no issues.

Relevant log output

Autoselected Clair3 model: r1041_e82_400bps_hac_g632
ERROR ~ Error executing process > 'annotate_snp_vcf (7)'

Caused by:
  Process `annotate_snp_vcf (7)` terminated with an error exit status (250)

Command executed:

  if [ "7" == '*' ]; then
      # SV is quick to annotate, dont bother breaking it apart
      # SNP is slow to annotate, we'll break it apart by contig
      # and merge it back later. filter the master VCF to current contig
      bcftools view -r 7 input.vcf.gz | bgzip > input.chr.vcf.gz

  # deal with samples which aren't hg19 or hg38
  if [[ "hg38" != "hg38" ]] && [[ "hg38" != "hg19" ]]; then
      # return the original VCF and index as the outputs
      cp ${INPUT_FILENAME} R8967.wf_${OUTPUT_LABEL}.vcf.gz
      cp ${INPUT_FILENAME}.tbi R8967.wf_${OUTPUT_LABEL}.vcf.gz.tbi
      # do some annotation
      if [[ "hg38" == "hg38" ]]; then

      elif [[ "hg38" == "hg19" ]]; then

      snpEff -Xmx47g ann -noStats -noLog $snpeff_db ${INPUT_FILENAME} > R8967.intermediate.snpeff_annotated.vcf
      # Add ClinVar annotations
      SnpSift annotate $clinvar_vcf R8967.intermediate.snpeff_annotated.vcf | bgzip > R8967.wf_${OUTPUT_LABEL}.vcf.gz
      tabix R8967.wf_${OUTPUT_LABEL}.vcf.gz

      # tidy up
      rm R8967.intermediate*

Application activity log entry

# A fatal error has been detected by the Java Runtime Environment:
#  SIGBUS (0x7) at pc=0x00002aaaab9c645e, pid=201, tid=202
# JRE version:  (21.0.2) (build )
# Java VM: OpenJDK 64-Bit Server VM (21.0.2-internal-adhoc.conda.src, mixed mode, sharing, tiered, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xc9b45e]  PerfMemory::alloc(unsigned long)+0x5e
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

---------------  S U M M A R Y ------------

Command Line: -Xlog:disable -Xlog:all=warning:stderr -Xmx47g /home/epi2melabs/conda/share/snpeff-5.1-2/snpEff.jar ann -noStats -noLog GRCh38.p13 input.chr.vcf.gz

Host: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 28 cores, 251G, Ubuntu 20.04.6 LTS
Time: Thu Mar 14 23:46:53 2024 PDT elapsed time: 0.212681 seconds (0d 0h 0m 0s)

---------------  T H R E A D  ---------------

Current thread (0x00002aaab0016b70):  JavaThread "Unknown thread" [_thread_in_vm, id=202, stack(0x00002aaaac492000,0x00002aaaac593000) (1028K)]

Stack: [0x00002aaaac492000,0x00002aaaac593000],  sp=0x00002aaaac5919d0,  free space=1022k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xc9b45e]  PerfMemory::alloc(unsigned long)+0x5e
V  [libjvm.so+0xc9a232]  PerfData::create_entry(BasicType, unsigned long, unsigned long)+0x72
V  [libjvm.so+0xc9ae0c]  PerfDataManager::create_string_variable(CounterNS, char const*, int, char const*, JavaThread*)+0x14c

Were you able to successfully run the latest version of the workflow with the demo data?


Other demo data information

demo pass
SamStudio8 commented 3 months ago

@trum994 It looks as though you are requesting a 48 GB of memory for the snpEff annotation step, which cannot be allocated. The workflow tries to send sensible defaults for all processes and additional memory for snpEff beyond the default has not brought much benefit in our testing. You should reduce (or remove) the directive that overrides the memory for snpeff in your .../test_epi2me_hv/myconfig.txt file.

trum994 commented 3 months ago

Thanks for your quick response. I will remove snpeff mem override in myconfig.txt and resume. However our cluster did not have any issues allocating 48GB.

trum994 commented 3 months ago

Unfortunately same issue:

cat R8967.intermediate.snpeff_annotated.vcf 
# A fatal error has been detected by the Java Runtime Environment:
#  SIGBUS (0x7) at pc=0x00002aaaab9c645e, pid=119, tid=164
# JRE version:  (21.0.2) (build )
# Java VM: OpenJDK 64-Bit Server VM (21.0.2-internal-adhoc.conda.src, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, serial gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xc9b45e]  PerfMemory::alloc(unsigned long)+0x5e
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
# An error report file with more information is saved as:
# .../nxf_work/89/ed3a3fd343527c8d5d39062582885a/hs_err_pid119.log

And here is the script with now 6GB.

cat .command.sh 
#!/bin/bash -euo pipefail
if [ "X" == '*' ]; then
    # SV is quick to annotate, dont bother breaking it apart
    # SNP is slow to annotate, we'll break it apart by contig
    # and merge it back later. filter the master VCF to current contig
    bcftools view -r X input.vcf.gz | bgzip > input.chr.vcf.gz

# deal with samples which aren't hg19 or hg38
if [[ "hg38" != "hg38" ]] && [[ "hg38" != "hg19" ]]; then
    # return the original VCF and index as the outputs
    cp ${INPUT_FILENAME} R8967.wf_${OUTPUT_LABEL}.vcf.gz
    cp ${INPUT_FILENAME}.tbi R8967.wf_${OUTPUT_LABEL}.vcf.gz.tbi
    # do some annotation
    if [[ "hg38" == "hg38" ]]; then

    elif [[ "hg38" == "hg19" ]]; then

    snpEff -Xmx5g ann -noStats -noLog $snpeff_db ${INPUT_FILENAME} > R8967.intermediate.snpeff_annotated.vcf
    # Add ClinVar annotations
    SnpSift annotate $clinvar_vcf R8967.intermediate.snpeff_annotated.vcf | bgzip > R8967.wf_${OUTPUT_LABEL}.vcf.gz
    tabix R8967.wf_${OUTPUT_LABEL}.vcf.gz

    # tidy up
    rm R8967.intermediate*
SamStudio8 commented 3 months ago

That is unfortunate! I assume the hs_err_pid119.log file is similar to the first one? A search indicates that this error could possibly be related to multiple JVMs accessing the same performance instrumentation file. Could you try adding the following to your custom configuration:

process {
    withName:annotate_vcf {
        maxForks = 1

This will force the SnpEff steps to run in serial which will mitigate this error (if that is the issue).

trum994 commented 3 months ago

With maxForks =1 added my snpeff hasn't crashed yet! This would also explain why the job had no issues when I manually kicked off .command.run. Of course as expected I only see 1 job running right now so this will take a while.

SamStudio8 commented 3 months ago

Thanks, let me know how you get on!

trum994 commented 3 months ago

[41/5fd95a] process > annotate_snp_vcf (24) [100%] 25 of 25 ✔ Complete! Now is there a way to fix this bug or I am stuck with maxForks = 1 ? Either way thanks for this workaround!