bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

KeyError when running an analysis mentioning "vrn_file-shared" #400

Closed lbeltrame closed 10 years ago

lbeltrame commented 10 years ago

Master from a couple of days ago.

Traceback:


Traceback (most recent call last):
  File "/usr/local/bin/bcbio_nextgen.py", line 8, in <module>
    execfile(__file__)
  File "/mnt/data/software/bcbio-nextgen/scripts/bcbio_nextgen.py", line 62, in <module>
    main(**kwargs)
  File "/mnt/data/software/bcbio-nextgen/scripts/bcbio_nextgen.py", line 40, in main
    run_main(**kwargs)
  File "/mnt/data/software/bcbio-nextgen/bcbio/pipeline/main.py", line 45, in run_main
    fc_dir, run_info_yaml)
  File "/mnt/data/software/bcbio-nextgen/bcbio/pipeline/main.py", line 83, in _run_toplevel
    for xs in pipeline.run(config, config_file, parallel, dirs, pipeline_items):
  File "/mnt/data/software/bcbio-nextgen/bcbio/pipeline/main.py", line 326, in run
    samples = region.parallel_variantcall_region(samples, run_parallel)
  File "/mnt/data/software/bcbio-nextgen/bcbio/pipeline/region.py", line 138, in parallel_variantcall_region
    "vrn_file", ["region", "sam_ref", "config"])
  File "/mnt/data/software/bcbio-nextgen/bcbio/distributed/split.py", line 39, in grouped_parallel_split_combine
    file_key, combine_arg_keys)
  File "/mnt/data/software/bcbio-nextgen/bcbio/distributed/split.py", line 197, in _organize_output
    cur_out = combine_map[data.pop("%s-shared" % file_key)]
KeyError: 'vrn_file-shared'

This occurs at the split_variants_by_sample step.

lbeltrame commented 10 years ago

Strangely enough, this does not occur when running the pipeline in local mode (this one was ran in ipython cluster mode).

chapmanb commented 10 years ago

Luca; Thanks for the report. Would you be able to share a representative sample YAML file from the run that failed? I completely re-did variant calling organization (and also BAM prep organization as of this morning) so this is the likely cause. The ipython/local mode difference is likely due to being a re-run rather than an original run, rather than being distributed. If I can reproduce the case locally I'll work on a fix. Thanks again for the report.

lbeltrame commented 10 years ago
- algorithm:
    aligner: bwa
    clinical_reporting: true
    coverage_depth: high
    coverage_interval: regional
    mark_duplicates: false
    min_allele_fraction: 5.0
    platform: illumina
    quality_format: Standard
    realign: gatk
    recalibrate: gatk
    trim_reads: false
    variant_regions: variant_regions1
    variantcaller:
    - mutect
    - varscan
    - freebayes
    write_summary: true
  analysis: variant2
  description: Sample_20183_normal
  files:
  - /mnt/data/projects/project2/raw_data/Sample_20183_normal_R1.fastq.gz
  - /mnt/data/projects/project2/raw_data/Sample_20183_normal_R2.fastq.gz
  genome_build: hg19
  metadata:
    batch: Sample_20183_tumor_vs_normal
    phenotype: normal
- algorithm:
    aligner: bwa
    clinical_reporting: true
    coverage_depth: high
    coverage_interval: regional
    mark_duplicates: false
    min_allele_fraction: 5.0
    platform: illumina
    quality_format: Standard
    realign: gatk
    recalibrate: gatk
    trim_reads: false
    variant_regions: variant_regions1
    variantcaller:
    - mutect
    - varscan
    - freebayes
    write_summary: true
  analysis: variant2
  description: Sample_20183_tumor
  files:
  - /mnt/data/projects/project2/raw_data/Sample_20183_tumor_R1.fastq.gz
  - /mnt/data/projects/project2/raw_data/Sample_20183_tumor_R2.fastq.gz
  genome_build: hg19
  metadata:
    batch: Sample_20183_tumor_vs_normal
    phenotype: tumor

variant_regions1 points to the BED file of the regions.

chapmanb commented 10 years ago

Luca -- thanks much. I spent more time on this and tried to clean up edge cases and simplify things a bit so believe I've resolved this issue. Please let us know if you run into any other problems but hopefully the latest development will work cleanly now.