bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

Pipeline crashes on tabix_index (paired variant calling on bed regions) #343

Closed matanhofree closed 10 years ago

matanhofree commented 10 years ago

Pipeline crashes on tabix_index function, when running paired variant calling.

trace:

[ti_index_core] the file out of order at line 36
' returned non-zero exit status 1
Traceback (most recent call last):
  File "/cellar/users/mhofree/projects/cancer_ngs/external/ngs-tools/bin/bcbio_nextgen.py", line 59, in <module>
    main(**kwargs)
  File "/cellar/users/mhofree/projects/cancer_ngs/external/ngs-tools/bin/bcbio_nextgen.py", line 39, in main
    run_main(**kwargs)
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 40, in run_main
    fc_dir, run_info_yaml)
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 87, in _run_toplevel
    for xs in pipeline.run(config, config_file, parallel, dirs, pipeline_items):
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 322, in run
    samples = region.parallel_variantcall_region(samples, run_parallel)
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/region.py", line 133, in parallel_variantcall_region
    "vrn_file", ["region", "sam_ref", "config"])
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/split.py", line 34, in grouped_parallel_split_combine
    split_output = parallel_fn(parallel_name, grouped_args)
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 82, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items):
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 644, in __call__
    self.dispatch(function, args, kwargs)
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 391, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 129, in __init__
    self.results = func(*args, **kwargs)
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 47, in wrapper
    return apply(f, *args, **kwargs)
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 85, in variantcall_sample
    return genotype.variantcall_sample(*args)
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/variation/genotype.py", line 377, in variantcall_sample
    region, call_file)
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/variation/varscan.py", line 27, in run_varscan
    assoc_files, region, out_file)
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/variation/samtools.py", line 42, in shared_variantcall
    tx_out_file)
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/variation/varscan.py", line 150, in _varscan_paired
    vcfutils.bgzip_and_index(out_file, config)
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/variation/vcfutils.py", line 324, in bgzip_and_index
    tabix_index(out_file, config)
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/variation/vcfutils.py", line 356, in tabix_index
    do.run(cmd.format(**locals()), "tabix index %s" % os.path.basename(in_file))
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 23, in run
    _do_run(cmd, checks, log_stdout)
  File "/cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 117, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command 'set -o pipefail; /cellar/users/mhofree/projects/cancer_ngs/external/ngs-tools/bin/tabix -f -p vcf /mnt/tmp/TCGA-CQ-5330/work/varscan/16/tx/tmpdz0AgO/tx/tmpJTQR
[ti_index_core] the file out of order at line 36
' returned non-zero exit status 1

Seem like there is indeed a problem with some of the vcf files, here is an example offending vcf: http://chianti.ucsd.edu/~mhofree/share/sample_problem.vcf

chapmanb commented 10 years ago

Matan; Thanks for the report and sorry about the problem. Along with #338 and #339, it's clear the experiment with replacing GATK CombineVariants with vcfintersect is not working, so I reverted back to the old approach. If you update to the latest development and remove the problem vcf.gz file it will re-combine and hopefully work cleanly now. Thanks for testing this and sorry again about the issue.