bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

Variant calling fails with KeyError: 'germline' #1844

Closed amizeranschi closed 7 years ago

amizeranschi commented 7 years ago

Hi,

I'm getting an error in the variant calling pipeline, can you please help with this? Here is the traceback:

[2017-03-06T21:55Z] Configuring 1 jobs to run, using 1 cores each with 3.50g of memory reserved for each job
[2017-03-06T21:55Z] Timing: alignment post-processing
[2017-03-06T21:55Z] multiprocessing: piped_bamprep
[2017-03-06T21:55Z] Timing: variant calling
[2017-03-06T21:55Z] multiprocessing: variantcall_sample
Traceback (most recent call last):
  File "/home/alex/bcbio/tools/bin/bcbio_nextgen.py", line 4, in <module>
    __import__('pkg_resources').run_script('bcbio-nextgen==1.0.1', 'bcbio_nextgen.py')
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 726, in run_script
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/__init__.py", line 1484, in run_script
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/bcbio_nextgen-1.0.1-py2.7.egg-info/scripts/bcbio_nextgen.py", line 234, in <module>
    main(**kwargs)
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/bcbio_nextgen-1.0.1-py2.7.egg-info/scripts/bcbio_nextgen.py", line 43, in main
    run_main(**kwargs)
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 43, in run_main
    fc_dir, run_info_yaml)
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 87, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 150, in variant2pipeline
    samples = genotype.parallel_variantcall_region(samples, run_parallel)
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/genotype.py", line 181, in parallel_variantcall_region
    "vrn_file", ["region", "sam_ref", "config"]))
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/split.py", line 32, in grouped_parallel_split_combine
    final_output = parallel_fn(parallel_name, split_args)
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"], batch_size=1)(joblib.delayed(fn)(x) for x in items):
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 758, in __call__
    while self.dispatch_one_batch(iterator):
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 608, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 571, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/joblib/_parallel_backends.py", line 109, in apply_async
    result = ImmediateResult(func)
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/joblib/_parallel_backends.py", line 326, in __init__
    self.results = batch()
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 51, in wrapper
    return apply(f, *args, **kwargs)
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 220, in variantcall_sample
    return genotype.variantcall_sample(*args)
  File "/home/alex/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/genotype.py", line 304, in variantcall_sample
    caller_fn = caller_fns[config["algorithm"].get("variantcaller")]
KeyError: 'germline'

And this is the YAML template I'm using:

# Template for whole genome Illumina variant calling with GATK pipeline
---
details:
  - analysis: variant2
    genome_build: sacCer3
    resources:
      default:
        memory: 2G
        cores: 2
    jvm_opts: ["-Xms750m", "-Xmx2000m"]
    metadata:
      batch: batch1
    algorithm:
      aligner: bwa
      mark_duplicates: true
      recalibrate: gatk
      realign: gatk
      variantcaller:
        germline: [gatk-haplotype, freebayes, platypus, varscan, samtools]
      ensemble:
        numpass: 3
      ploidy: 1      
      effects: vep
      svcaller: [lumpy, manta, wham, metasv]
      variant_regions: ../config/variant_regions.bed
chapmanb commented 7 years ago

Thanks for the report and sorry about the issue. The germline sub-keyword is used only for specifying germline calling within a somatic tumor/normal sample. For this case you don't need that and should supply:

variantcaller: [gatk-haplotype, freebayes, platypus, varscan, samtools]

Hope that gets your analysis running cleanly.

lpantano commented 7 years ago

Hi @amizeranschi

I will close this guessing it worked at the end. Let us know if you find more issues.