bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

NA12878-exome-eval example: Error in "work/gatk-haplotype/dbsnp.conf" #3160

Closed tony-travis closed 4 years ago

tony-travis commented 4 years ago

Version info

To Reproduce

Observed behavior

[2020-03-31T22:21Z] System YAML configuration: /usr/local/bcbio/galaxy/bcbio_system.yaml.
[2020-03-31T22:21Z] Locale set to C.UTF-8.
[2020-03-31T22:21Z] Resource requests: bwa, sambamba, samtools; memory: 3.00, 3.00, 3.00; cores: 16, 16, 16
[2020-03-31T22:21Z] Configuring 1 jobs to run, using 8 cores each with 24.1g of memory reserved for each job
[2020-03-31T22:21Z] Timing: organize samples
[2020-03-31T22:21Z] multiprocessing: organize_samples
[2020-03-31T22:21Z] Using input YAML configuration: /work/manager/NA12878-exome-eval/config/NA12878-exome-methodcmp.yaml
[2020-03-31T22:21Z] Checking sample YAML configuration: /work/manager/NA12878-exome-eval/config/NA12878-exome-methodcmp.yaml
[2020-03-31T22:21Z] Retreiving program versions from /usr/local/bcbio/manifest/python-packages.yaml.
[2020-03-31T22:21Z] Retreiving program versions from /usr/local/bcbio/manifest/r-packages.yaml.
...
[2020-04-01T01:28Z] Annotate VCF file: NA12878, gatk-haplotype
[2020-04-01T01:28Z] Annotate with dbSNP
[2020-04-01T01:28Z] =============================================
[2020-04-01T01:28Z] vcfanno version 0.3.2 [built with go1.12.1]
[2020-04-01T01:28Z] see: https://github.com/brentp/vcfanno
[2020-04-01T01:28Z] =============================================
[2020-04-01T01:28Z] vcfanno.go:112: [Flatten] unable to open file: //work/manager/NA12878-exome-eval/variation/dbsnp-151.vcf.gz in
[2020-04-01T01:28Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command 'set -o pipefail; vcfanno -p 8 /work/manager/NA12878-exome-eval/work/gatk-haplotype/dbsnp.conf /work/manager/NA12878-exome-eval/work/gatk-haplotype/NA12878-effects.vcf.gz |  bgzip -c > /work/manager/NA12878-exome-eval/work/bcbiotx/tmpwllhz4ka/NA12878-effects-annotated.vcf.gz
=============================================
vcfanno version 0.3.2 [built with go1.12.1]
see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:112: [Flatten] unable to open file: //work/manager/NA12878-exome-eval/variation/dbsnp-151.vcf.gz in 
' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/usr/local/bcbio/bin/bcbio_nextgen.py", line 245, in <module>
    main(**kwargs)
  File "/usr/local/bcbio/bin/bcbio_nextgen.py", line 46, in main
    run_main(**kwargs)
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 50, in run_main
    fc_dir, run_info_yaml)
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 91, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 165, in variant2pipeline
    samples = run_parallel("postprocess_variants", samples)
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"], batch_size=1, backend="multiprocessing")(joblib.delayed(fn)(*x) for x in items):
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 1004, in __call__
    if self.dispatch_one_batch(iterator):
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 835, in dispatch_one_batch
    self._dispatch(tasks)
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 754, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 209, in apply_async
    result = ImmediateResult(func)
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 590, in __init__
    self.results = batch()
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 256, in __call__
    for func, args, kwargs in self.items]
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 256, in <listcomp>
    for func, args, kwargs in self.items]
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/bcbio/utils.py", line 55, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/bcbio/distributed/multitasks.py", line 235, in postprocess_variants
    return variation.postprocess_variants(*args)
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/bcbio/pipeline/variation.py", line 105, in postprocess_variants
    orig_items)
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/bcbio/variation/annotation.py", line 65, in finalize_vcf
    out_file = _add_dbsnp(in_file, dbsnp_file, items[0], out_file, post_cl)
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/bcbio/variation/annotation.py", line 172, in _add_dbsnp
    do.run(cmd.format(**locals()), "Annotate with dbSNP")
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/usr/local/bcbio/anaconda/lib/python3.6/site-packages/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command 'set -o pipefail; vcfanno -p 8 /work/manager/NA12878-exome-eval/work/gatk-haplotype/dbsnp.conf /work/manager/NA12878-exome-eval/work/gatk-haplotype/NA12878-effects.vcf.gz |  bgzip -c > /work/manager/NA12878-exome-eval/work/bcbiotx/tmpwllhz4ka/NA12878-effects-annotated.vcf.gz
=============================================
vcfanno version 0.3.2 [built with go1.12.1]
see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:112: [Flatten] unable to open file: //work/manager/NA12878-exome-eval/variation/dbsnp-151.vcf.gz in 
' returned non-zero exit status 1.

Expected behavior Annotation of output .vcf files

Log files Please attach (10MB max): bcbio-nextgen.log, bcbio-nextgen-commands.log, and bcbio-nextgen-debug.log.

Additional context The file "work/gatk-haplotype/dbsnp.conf" refers to:

naumenko-sa commented 4 years ago

Hi Tony @tony-travis !

Thanks for the report!

I thinks itis because https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/hg38/dbsnp.yaml installed dbsnp153 and https://github.com/bcbio/bcbio-nextgen/blob/master/config/genomes/hg38-resources.yaml had dbsnp151 the older dbsnp (I have fixed it). We missed because we had previous versions of dbsnp installed as well.

Please try bcbio_nextgen.py upgrade -u skip --genomes hg38 and let us know if that fixes the issue for you.

SN

tony-travis commented 4 years ago

Hi, Sergey. OK, that's fixed the problem - Thanks! Tony.