bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

Ensemble variant calling error in v1.1.6a0 - `Problem retrieving reference variant` #2954

Closed FedericoComoglio closed 4 years ago

FedericoComoglio commented 4 years ago

Dear developers,

on 18.09.2019 we upgraded to bcbio v1.1.6a0 (devel branch). When running a variant2 analysis (with the exact same configuration that we extensively used across several WES runs in recent months), we now run into an issue when generating the ensemble call set. This is true both for germline and somatic variant calling. For example:

somatic:
      - vardict
      - mutect2
      - varscan
      - freebayes
      - strelka2

In both cases, the relevant error appears to be:

Problem retrieving reference variant for {:chr "chr1", :start 184183, :refa "C", :alta ("T"), :end 184184, :vc-indices (0 2)}

Traceback (most recent call last):
  File "/mnt/tools/bin/bcbio_nextgen.py", line 245, in <module>
    main(**kwargs)
  File "/mnt/tools/bin/bcbio_nextgen.py", line 46, in main
    run_main(**kwargs)
  File "/mnt/data/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 45, in run_main
    fc_dir, run_info_yaml)
  File "/mnt/data/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 89, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/mnt/data/anaconda/lib/python3.6/site-packages/bcbio/pipeline/main.py", line 172, in variant2pipeline
    samples = ensemble.combine_calls_parallel(samples, run_parallel)
  File "/mnt/data/anaconda/lib/python3.6/site-packages/bcbio/variation/ensemble.py", line 118, in combine_calls_parallel
    processed = run_parallel("combine_calls", ((b, xs, xs[0]) for b, xs in batch_groups.items()))
  File "/mnt/data/anaconda/lib/python3.6/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/mnt/data/anaconda/lib/python3.6/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"], batch_size=1, backend="multiprocessing")(joblib.delayed(fn)(*x) for x in items):
  File "/mnt/data/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 921, in __call__
    if self.dispatch_one_batch(iterator):
  File "/mnt/data/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 759, in dispatch_one_batch
    self._dispatch(tasks)
  File "/mnt/data/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 716, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/mnt/data/anaconda/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 182, in apply_async
    result = ImmediateResult(func)
  File "/mnt/data/anaconda/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 549, in __init__
    self.results = batch()
  File "/mnt/data/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 225, in __call__
    for func, args, kwargs in self.items]
  File "/mnt/data/anaconda/lib/python3.6/site-packages/joblib/parallel.py", line 225, in <listcomp>
    for func, args, kwargs in self.items]
  File "/mnt/data/anaconda/lib/python3.6/site-packages/bcbio/utils.py", line 55, in wrapper
    return f(*args, **kwargs)
  File "/mnt/data/anaconda/lib/python3.6/site-packages/bcbio/distributed/multitasks.py", line 348, in combine_calls
    return ensemble.combine_calls(*args)
  File "/mnt/data/anaconda/lib/python3.6/site-packages/bcbio/variation/ensemble.py", line 84, in combine_calls
    callinfo = _run_ensemble_intersection(batch_id, vrn_files, caller_names, base_dir, edata)
  File "/mnt/data/anaconda/lib/python3.6/site-packages/bcbio/variation/ensemble.py", line 258, in _run_ensemble_intersection
    do.run(cmd, "Ensemble intersection calling: %s" % (batch_id))
  File "/mnt/data/anaconda/lib/python3.6/site-packages/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/mnt/data/anaconda/lib/python3.6/site-packages/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command 'set -o pipefail; unset JAVA_HOME && export PATH=/mnt/data/anaconda/bin:"$PATH" && /mnt/data/anaconda/bin/bcbio-variati
on-recall ensemble --cores=8 --numpass 2 --names vardict,mutect2,varscan,freebayes,strelka2 --nofiltered [...]
/mnt/data/anaconda/bin/bcbio-variation-recall: line 6: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8): No such file or directory

Would it be possible for you to look into this? Please let me know if we can be of help here.

Thank you in advance, Federico

roryk commented 4 years ago

Hi Federico,

Thank you for posting such a nice issue. Have you updated the tools on your bcbio installation? Brad made a few fixes for issues like this in bcbio-variation-recall, the version that is fixed should be 0.2.6. If you do bcbio-variation-recall version you can figure out what version you have.

FedericoComoglio commented 4 years ago

Hi Rory,

thank you. We have bcbio.variation.recall v0.2.5. I will upgrade tools to the latest devel and resume the runs. I will let you know if this fixes it.

Federico

naumenko-sa commented 4 years ago

Hi @FedericoComoglio!

Could you please clarify whether you upgraded with bcbio upgrade -u development (which pulls the latest code from the master branch) or you used the development branch (https://github.com/bcbio/bcbio-nextgen/tree/develop).

Thanks! Sergey

FedericoComoglio commented 4 years ago

Hi Sergey,

I upgraded with bcbio upgrade -u development. This fixed the issue, the ensemble is now correctly generated. Thank you.

Federico