epi2me-labs / wf-human-variation

Other
86 stars 41 forks source link

snp:evaluate_candidates Aborted #154

Closed gdemoro closed 1 month ago

gdemoro commented 3 months ago

Operating System

Other Linux (please specify below)

Other Linux

Ubuntu 22.04.3 LTS

Workflow Version

v2.0.0-g52e3698

Workflow Execution

EPI2ME Desktop application

EPI2ME Version

No response

CLI command run

nextflow run epi2me-labs/wf-human-variation --bam simplex_aln.bam --basecaller_cfg dna_r10.4.1_e8.2_400bps_sup@v4.3.0 --ref GCF_000001635.27_GRCm39_genomic.fna --sample_name wildtype --mod --snp --sv -profile standard --cnv false --str false --annotation false --include_all_ctgs -c config.cfg --bam_min_coverage 3

config file:

executor {
    $local {
        cpus = 20
        memory = "60 GB"
    }
}

Workflow Execution - CLI Execution Profile

standard (default)

What happened?

The pipeline crash at:

process > snp:evaluate_candidates (599) [ 77%] 1027 of 1321, cached: 1020, failed: 1

I relaunched and resumed the run several times. Sometimes it stops immediately and sometimes after progressing 1-2%

The process crash but the pipeline stay on idle and I have to close it manually.

Relevant log output

ERROR ~ Error executing process > 'snp:evaluate_candidates (249)'

Caused by:
  Process `snp:evaluate_candidates (249)` terminated with an error exit status (134)

Command executed:

  mkdir output
  echo "[INFO] 6/7 Call low-quality variants using full-alignment model"
  python $(which clair3.py) CallVariantsFromCffi             --chkpnt_fn model/full_alignment             --bam_fn simplex_aln.bam             --call_fn output/full_alignment_NC_000068.8.71_89.vcf             --sampleName wildtype             --ref_fn GCF_000001635.27_GRCm39_genomic.fna             --full_aln_regions NC_000068.8.71_89             --ctgName NC_000068.8             --add_indel_length             --gvcf false             --minMQ 5             --minCoverage 2             --snp_min_af 0.08             --indel_min_af 0.15             --platform ont             --cmd_fn CMD             --phased_vcf_fn phased_NC_000068.8.vcf.gz

Command exit status:
  134

Command output:
  [INFO] 6/7 Call low-quality variants using full-alignment model

Command error:
  [INFO] 6/7 Call low-quality variants using full-alignment model
  Calling variants ...
  free(): invalid next size (fast)
  .command.sh: line 4:    32 Aborted                 (core dumped) python $(which clair3.py) CallVariantsFromCffi --chkpnt_fn model/full_alignment --bam_fn simplex_aln.bam --call_fn output/full_alignment_NC_000068.8.71_89.vcf --sampleName wildtype --ref_fn GCF_000001635.27_GRCm39_genomic.fna --full_aln_regions NC_000068.8.71_89 --ctgName NC_000068.8 --add_indel_length --gvcf false --minMQ 5 --minCoverage 2 --snp_min_af 0.08 --indel_min_af 0.15 --platform ont --cmd_fn CMD --phased_vcf_fn phased_NC_000068.8.vcf.gz

this is the relevant part in the nextflow.log

mar-08 07:50:39.262 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1066; name: snp:evaluate_candidates (249); status: COMPLETED; exit: 134; error: -; workDir: /data/xyz/xyz/xyz/Alignment/work/8d/6d8d17ffa44bab94946168f35b7d13]
mar-08 07:50:39.264 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=snp:evaluate_candidates (249); work-dir=/data/xyz/xyz/xyz/Alignment/work/8d/6d8d17ffa44bab94946168f35b7d13
  error [nextflow.exception.ProcessFailedException]: Process `snp:evaluate_candidates (249)` terminated with an error exit status (134)
mar-08 07:50:39.264 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
mar-08 07:50:39.265 [Task submitter] INFO  nextflow.Session - [c1/51c912] Submitted process > snp:evaluate_candidates (599)
mar-08 07:50:39.270 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'snp:evaluate_candidates (249)'

Caused by:
  Process `snp:evaluate_candidates (249)` terminated with an error exit status (134)

Command executed:

  mkdir output
  echo "[INFO] 6/7 Call low-quality variants using full-alignment model"
  python $(which clair3.py) CallVariantsFromCffi             --chkpnt_fn model/full_alignment             --bam_fn simplex_aln.bam             --call_fn output/full_alignment_NC_000068.8.71_89.vcf             --sampleName wildtype             --ref_fn GCF_000001635.27_GRCm39_genomic.fna             --full_aln_regions NC_000068.8.71_89             --ctgName NC_000068.8             --add_indel_length             --gvcf false             --minMQ 5             --minCoverage 2             --snp_min_af 0.08             --indel_min_af 0.15             --platform ont             --cmd_fn CMD             --phased_vcf_fn phased_NC_000068.8.vcf.gz

Command exit status:
  134

Command output:
  [INFO] 6/7 Call low-quality variants using full-alignment model

Command error:
  [INFO] 6/7 Call low-quality variants using full-alignment model
  Calling variants ...
  free(): invalid next size (fast)
  .command.sh: line 4:    32 Aborted                 (core dumped) python $(which clair3.py) CallVariantsFromCffi --chkpnt_fn model/full_alignment --bam_fn simplex_aln.bam --call_fn output/full_alignment_NC_000068.8.71_89.vcf --sampleName wildtype --ref_fn GCF_000001635.27_GRCm39_genomic.fna --full_aln_regions NC_000068.8.71_89 --ctgName NC_000068.8 --add_indel_length --gvcf false --minMQ 5 --minCoverage 2 --snp_min_af 0.08 --indel_min_af 0.15 --platform ont --cmd_fn CMD --phased_vcf_fn phased_NC_000068.8.vcf.gz

Work dir:
  /data/xyz/xyz/xyz/Alignment/work/8d/6d8d17ffa44bab94946168f35b7d13

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
mar-08 07:50:39.274 [Task monitor] INFO  nextflow.Session - Execution cancelled -- Finishing pending tasks before exit
mar-08 07:50:39.460 [main] DEBUG nextflow.Session - Session await > all processes finished
mar-08 07:51:25.456 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1427; name: snp:evaluate_candidates (610); status: COMPLETED; exit: 0; error: -; workDir: /data/xyz/xyz/xyz/Alignment/work/b0/8306551b3e4d3e36db606106fb9aa2]
mar-08 07:51:25.535 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1418; name: snp:evaluate_candidates (601); status: COMPLETED; exit: 0; error: -; workDir: /data/xyz/xyz/xyz/Alignment/work/6b/454ef0e573e3842d7e0a5ebb7f5632]
mar-08 07:51:25.932 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1415; name: snp:evaluate_candidates (598); status: COMPLETED; exit: 0; error: -; workDir: /data/xyz/xyz/xyz/Alignment/work/28/0c36a9ce10f4a797cecc8ca7653a25]
mar-08 07:51:26.038 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1420; name: snp:evaluate_candidates (603); status: COMPLETED; exit: 0; error: -; workDir: /data/xyz/xyz/xyz/Alignment/work/24/35565129272b2dcd0da446fafc7d5b]
mar-08 07:51:27.163 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1430; name: snp:evaluate_candidates (613); status: COMPLETED; exit: 0; error: -; workDir: /data/xyz/xyz/xyz/Alignment/work/08/a034335d6b83044af25806791d6996]
mar-08 07:51:29.664 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1416; name: snp:evaluate_candidates (599); status: COMPLETED; exit: 0; error: -; workDir: /data/xyz/xyz/xyz/Alignment/work/c1/51c912cc38ec17ecd3bff1facaccb1]
mar-08 07:51:29.667 [main] DEBUG nextflow.Session - Session await > all barriers passed
mar-08 07:51:50.263 [SIGINT handler] DEBUG nextflow.Session - Session aborted -- Cause: SIGINT
mar-08 07:51:50.617 [main] DEBUG nextflow.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=6; failedCount=1; ignoredCount=0; cachedCount=1837; pendingCount=294; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=5m 4s; failedDuration=5.3s; cachedDuration=2d 20h 41m 14s;loadCpus=0; loadMemory=0; peakRunning=6; peakCpus=6; peakMemory=48 GB; ]
mar-08 07:51:50.617 [main] DEBUG nextflow.trace.TraceFileObserver - Workflow completed -- saving trace file
mar-08 07:51:50.618 [main] DEBUG nextflow.trace.ReportObserver - Workflow completed -- rendering execution report
mar-08 07:51:50.647 [SIGINT handler] INFO  nextflow.Session - Adieu

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

No response

cjw85 commented 3 months ago

Hi @gdemoro,

Thank you for reporting this issue. We have become aware of a few issues similar to what you report in the Clair3 code.

If would help us immensely if you could share the contents of the directory listed in the error above:

  /data/xyz/xyz/xyz/Alignment/work/8d/6d8d17ffa44bab94946168f35b7d13

If this is sensitive patient data then please don't worry. We're working with @aquaskyline's team on these issues.

gdemoro commented 3 months ago

Thank you for your response. Unfortunately, I cannot share the contents of the folder. Although it is not human data, the input files belong to one of our clients and I do not have permission to share them.

cjw85 commented 1 month ago

We've committed some fixes to clair3 that I believe may fix this issue. Please update the workflow.