Validation error: "Cannot retrieve reference sequence names"

parlar commented 8 years ago

I'm getting an error running a validation against a custom reference call set. Any ideas what might be wrong?

[2016-01-13T22:53Z] INFO  23:53:34,627 HelpFormatter - Date/Time: 2016/01/13 23:53:34
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/18-11/samtools/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/18-11/samtools/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/18-11/samtools/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/18-11/samtools/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/18-11/samtools/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/18-11/samtools/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/12-12/samtools/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/12-12/samtools/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/12-12/samtools/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/12-12/samtools/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/12-12/samtools/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/12-12/samtools/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/35-11/samtools/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/35-11/samtools/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/35-11/samtools/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/35-11/samtools/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/35-11/samtools/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/35-11/samtools/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/1-11/samtools/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/1-11/samtools/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/1-11/samtools/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/1-11/samtools/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/1-11/samtools/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/1-11/samtools/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/10-12/samtools/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/10-12/samtools/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/10-12/samtools/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/10-12/samtools/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/10-12/samtools/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/10-12/samtools/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/51-11/samtools/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/51-11/samtools/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/51-11/samtools/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/51-11/samtools/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/51-11/samtools/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/51-11/samtools/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/18-11/freebayes/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/18-11/freebayes/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/18-11/freebayes/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/18-11/freebayes/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/18-11/freebayes/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/18-11/freebayes/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/12-12/freebayes/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/12-12/freebayes/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/12-12/freebayes/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/12-12/freebayes/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/12-12/freebayes/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/12-12/freebayes/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/35-11/freebayes/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/35-11/freebayes/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/35-11/freebayes/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/35-11/freebayes/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/35-11/freebayes/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/35-11/freebayes/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/1-11/freebayes/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/1-11/freebayes/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/1-11/freebayes/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/1-11/freebayes/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/1-11/freebayes/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/1-11/freebayes/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/10-12/freebayes/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/10-12/freebayes/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/10-12/freebayes/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/10-12/freebayes/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/10-12/freebayes/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/10-12/freebayes/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/51-11/freebayes/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/51-11/freebayes/rtg/tp-baseline.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/51-11/freebayes/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/51-11/freebayes/rtg/fp.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/51-11/freebayes/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
[/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/51-11/freebayes/rtg/fn.vcf.gz] Warning: no BGZF EOF marker; file may be truncated.
Traceback (most recent call last):
  File "/home/genetik/bcbio/bin/bcbio_nextgen.py", line 226, in <module>
    main(**kwargs)
  File "/home/genetik/bcbio/bin/bcbio_nextgen.py", line 43, in main
    run_main(**kwargs)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 39, in run_main
    fc_dir, run_info_yaml)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 82, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 166, in variant2pipeline
    samples = run_parallel("compare_to_rm", samples)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items):
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 804, in __call__
    while self.dispatch_one_batch(iterator):
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 662, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 570, in _dispatch
    job = ImmediateComputeBatch(batch)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 183, in __init__
    self.results = batch()
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 72, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 51, in wrapper
    return apply(f, *args, **kwargs)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 203, in compare_to_rm
    return validate.compare_to_rm(*args)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/validate.py", line 108, in compare_to_rm
    eval_files = _run_rtg_eval(vrn_file, rm_file, rm_interval_file, base_dir, toval_data)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/validate.py", line 166, in _run_rtg_eval
    cmd += ["--vcf-score-field='%s'" % (_pick_best_quality_score(vrn_file))]
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/validate.py", line 193, in _pick_best_quality_score
    with VariantFile(vrn_file) as val_in:
  File "pysam/cbcf.pyx", line 2359, in pysam.cbcf.VariantFile.__init__ (pysam/cbcf.c:36299)
  File "pysam/cbcf.pyx", line 2519, in pysam.cbcf.VariantFile.open (pysam/cbcf.c:38620)
  File "pysam/cbcf.pyx", line 2590, in pysam.cbcf.VariantFile.open (pysam/cbcf.c:39699)
  File "pysam/cbcf.pyx", line 2113, in pysam.cbcf.makeTabixIndex (pysam/cbcf.c:33771)
  File "pysam/cbcf.pyx", line 2093, in pysam.cbcf.TabixIndex.__init__ (pysam/cbcf.c:33312)
ValueError: Cannot retrieve reference sequence names

The config file looks like this:


---
upload:
  dir: ../final
details:
  - files: [/home/genetik/revalidation/fastqs_2016.01.13-15.14.13/1-11_S2_L001_R1_001_val_1.fq.gz, /home/genetik/revalidation/fastqs_2016.01.13-15.14.13/1-11_S2_L001_R2_
001_val_2.fq.gz]
    description: 1-11
    metadata:
      batch: batch1
      sex: male
    analysis: variant2
    genome_build: hg19
    algorithm:
      aligner: bwa
      mark_duplicates: false
      variantcaller: [freebayes, gatk-haplotype, platypus, samtools]
      svcaller: [lumpy, manta, cnvkit]
      quality_format: Standard
      coverage_interval: amplicon
      recalibrate: false
      realign: gatk
      merge_bamprep: true
      variant_regions: /home/genetik/calls2xls/designfiles/LKPN_hart_design_III_17150-1413761621.bed
      clinical_reporting: true
      effects: false
      ensemble:
        numpass: 1
      validate: /home/genetik/revalidation/vcfs/1-11-ensemble-single.vcf
      validate_regions: /home/genetik/calls2xls/designfiles/LKPN_hart_design_III_17150-1413761621.bed
      validate_method: rtg
[etc...]

chapmanb commented 8 years ago

Pär; Sorry about the issue. It looks like something went wrong with the RTG vcfeval comparison and the output files are somehow truncated or corrupt, so the downstream processing failed. It's hard to tell what happened from this traceback, but you might want to look at:

/home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/18-11/samtools/rtg/

to see what the tp.vcf.gz and other output files look like. If they are problematic by inspection, you could try checking in log/bcbio-nextgen-commands.log for the rtg vcfeval commands and run these by hand to see if that produces any useful error message. My guess is it doesn't like something about your input VCF but hopefully that process would help identify exactly what is going on.

Lenbok commented 8 years ago

I just happened across this issue -- am I correct in interpreting from the backtrace that it's happening while validate.py is examining the call VCF for an appropriate score field, before it even gets to invoking vcfeval?

Assuming it does get as far as running vcfeval, if vcfeval creates a file called done, then it thinks it has completed successfully (and in general we are also quite careful about returning appropriate exit codes in our tools too). (Brad, I'm not sure if you are checking either of those conditions). Anyway if that done file does not exist, the vcfeval.log file in the output directory should indicate what went wrong.

chapmanb commented 8 years ago

Len -- thanks much for the information. bcbio does try to check that it finishes but I'm not sure what happens in this case. It seems like the initial validation file might be malformed from the errors that arose. Pär, is that a possibility? Happy to dig into this more with any other details.

parlar commented 8 years ago

Sorry for my disappearance from this issue. Too many things going on here at the same time I'm afraid. Thanks Len and Brad. I'll take a look at the validation file and try to clean it up. It's an ensemble call vcf file from bcbio-nextgen.

parlar commented 8 years ago

Still can't figure this out. I ran it again and it seems that it is the validation of sample 18-11 / ensemble calls that fail:

[2016-01-22T12:25Z] bgzip 18-11-ensemble-single.vt.vcf.gz
[2016-01-22T12:25Z] tabix index 18-11-ensemble-single.vt.vcf.gz
[2016-01-22T12:25Z] Select sample: 18-11
[2016-01-22T12:25Z] tabix index batch1-filter-18-11.vcf.gz
[2016-01-22T12:25Z] Resource requests: ; memory: 1.00; cores: 1
[2016-01-22T12:25Z] Configuring 10 jobs to run, using 1 cores each with 1.00g of memory reserved for each job
[2016-01-22T12:25Z] Intersect callable intervals for rtg vcfeval
Traceback (most recent call last):
  File "/home/genetik/bcbio/bin/bcbio_nextgen.py", line 226, in <module>
    main(**kwargs)
  File "/home/genetik/bcbio/bin/bcbio_nextgen.py", line 43, in main
    run_main(**kwargs)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 39, in run_main
    fc_dir, run_info_yaml)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 82, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 166, in variant2pipeline
    samples = run_parallel("compare_to_rm", samples)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items):
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 804, in __call__
    while self.dispatch_one_batch(iterator):
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 662, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 570, in _dispatch
    job = ImmediateComputeBatch(batch)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 183, in __init__
    self.results = batch()
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 72, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 51, in wrapper
    return apply(f, *args, **kwargs)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 203, in compare_to_rm
    return validate.compare_to_rm(*args)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/validate.py", line 108, in compare_to_rm
    eval_files = _run_rtg_eval(vrn_file, rm_file, rm_interval_file, base_dir, toval_data)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/validate.py", line 166, in _run_rtg_eval
    cmd += ["--vcf-score-field='%s'" % (_pick_best_quality_score(vrn_file))]
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/validate.py", line 193, in _pick_best_quality_score
    with VariantFile(vrn_file) as val_in:
  File "pysam/cbcf.pyx", line 2359, in pysam.cbcf.VariantFile.__init__ (pysam/cbcf.c:36299)
  File "pysam/cbcf.pyx", line 2519, in pysam.cbcf.VariantFile.open (pysam/cbcf.c:38620)
  File "pysam/cbcf.pyx", line 2590, in pysam.cbcf.VariantFile.open (pysam/cbcf.c:39699)
  File "pysam/cbcf.pyx", line 2113, in pysam.cbcf.makeTabixIndex (pysam/cbcf.c:33771)
  File "pysam/cbcf.pyx", line 2093, in pysam.cbcf.TabixIndex.__init__ (pysam/cbcf.c:33312)
ValueError: Cannot retrieve reference sequence names

However, under validation, the 18-11/ensemble directory does not exist. Hence, no vcfeval.log to investigate. Do you need data / configuration files to investigate?

Rerunning the script using -n 1 unfortunately fails to explain which file that is causing the problem. The the only output I get is found below.

[2016-01-22T12:53Z] Timing: validation
[2016-01-22T12:53Z] multiprocessing: compare_to_rm
[2016-01-22T12:53Z] Resource requests: ; memory: 1.00; cores: 1
[2016-01-22T12:53Z] Configuring 1 jobs to run, using 1 cores each with 1.00g of memory reserved for each job
Traceback (most recent call last):
  File "/home/genetik/bcbio/bin/bcbio_nextgen.py", line 226, in <module>
    main(**kwargs)
  File "/home/genetik/bcbio/bin/bcbio_nextgen.py", line 43, in main
    run_main(**kwargs)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 39, in run_main
    fc_dir, run_info_yaml)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 82, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 166, in variant2pipeline
    samples = run_parallel("compare_to_rm", samples)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items):
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 804, in __call__
    while self.dispatch_one_batch(iterator):
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 662, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 570, in _dispatch
    job = ImmediateComputeBatch(batch)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 183, in __init__
    self.results = batch()
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 72, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 51, in wrapper
    return apply(f, *args, **kwargs)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 203, in compare_to_rm
    return validate.compare_to_rm(*args)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/validate.py", line 108, in compare_to_rm
    eval_files = _run_rtg_eval(vrn_file, rm_file, rm_interval_file, base_dir, toval_data)
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/validate.py", line 166, in _run_rtg_eval
    cmd += ["--vcf-score-field='%s'" % (_pick_best_quality_score(vrn_file))]
  File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/validate.py", line 193, in _pick_best_quality_score
    with VariantFile(vrn_file) as val_in:
  File "pysam/cbcf.pyx", line 2359, in pysam.cbcf.VariantFile.__init__ (pysam/cbcf.c:36299)
  File "pysam/cbcf.pyx", line 2519, in pysam.cbcf.VariantFile.open (pysam/cbcf.c:38620)
  File "pysam/cbcf.pyx", line 2590, in pysam.cbcf.VariantFile.open (pysam/cbcf.c:39699)
  File "pysam/cbcf.pyx", line 2113, in pysam.cbcf.makeTabixIndex (pysam/cbcf.c:33771)
  File "pysam/cbcf.pyx", line 2093, in pysam.cbcf.TabixIndex.__init__ (pysam/cbcf.c:33312)
ValueError: Cannot retrieve reference sequence names

chapmanb commented 8 years ago

Pär; Thanks for following up. You're right that something is going wrong with parsing the variant file, although it is also tough to tell without the exact file. I pushed a small change to report the file that fails which might help isolate this, if you upgrade to the latest development (bcbio_nextgen.py upgrade -u development). From your debugging so far it might be the 18-11 ensemble file. Is there anything unusual about that file that might give an idea of how to reproduce? If it's empty or otherwise malformed I can try to replicate and work around the issue. Thanks again for the help debugging.

parlar commented 8 years ago

Thanks. I retried after updating.

The error message this time was Failed to parse input file /home/genetik/revalidation/calling_2016.01.13-15.14.13/work/platypus/batch1-filter-18-11.vcf.gz.

Except for the header, it turns out that this file is empty. No variants there. I guess that makes sense then that it cannot find any "reference sequence names".

chapmanb commented 8 years ago

Pär; Thanks for confirming this. I pushed a fix to the latest development version which skips trying to parse the file if it's empty without variants. Apologies, I hadn't realized pysam would fail in this situation. If you update (bcbio_nextgen.py upgrade -u development) then it should skip this and continue cleanly. Please let us know if you run into any other problems with this.

parlar commented 8 years ago

Hi again! It seems that the same problem propagates downstream, rtg exits with this message:

subprocess.CalledProcessError: Command 'set -o pipefail; export RTG_JAVA_OPTS='-Xms1g' && export RTG_MEM=5g && rtg vcfeval --threads 6 -b /home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/18-11/platypus/18-11-ensemble-single.vt.vcf.gz --bed-regions /home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/18-11/platypus/18-11-sort-callable_sample-18-11-platypus-wrm.bed -c /home/genetik/revalidation/calling_2016.01.13-15.14.13/work/platypus/batch1-filter-18-11.vcf.gz -t /home/genetik/bcbio/share/bcbio/genomes/Hsapiens/hg19/rtg/hg19.sdf -o /home/genetik/revalidation/calling_2016.01.13-15.14.13/work/validate/18-11/platypus/rtg --vcf-score-field='DP'
Error: There were no sequence names in common between the supplied baseline and called variant sets. Check they use the same reference and are non-empty.
' returned non-zero exit status 1

chapmanb commented 8 years ago

Pär; Sorry about the continued problems. I pushed a fix to skip these empty ones entirely during validation, which should hopefully avoid all these problems if you update to development one more time. Hope this works for your analysis.

parlar commented 8 years ago

thanks!

bcbio / bcbio-nextgen

Validation error: "Cannot retrieve reference sequence names" #1183