chapmanb / bcbio.variation

Toolkit to analyze genomic variation data, built on the GATK with Clojure
66 stars 15 forks source link

"variant-ensemble" command failing from within bcbio-nextgen example pipeline run #23

Closed nhansen closed 9 years ago

nhansen commented 9 years ago

The variant-ensemble command is failing when called from the bcbio-nextgen "ensemble" stage for the "Exome with Validation Against Reference Materials" example pipeline (yes, I've had horrible luck getting this example to run). When run manually with the 0.2.0-standalone.jar, the command below gets past the "CombineVariants" step, but with 0.2.1, it fails.

Here's the command from my bcbio-nextgen log:

java -Xms750m -Xmx2500m -Djava.io.tmpdir=/cluster/ifs/projects/seqSim/bcbio_testanalysis/NA12878-exome-eval/work_parallel/ensemble/NA12878-1/tmp -jar /cluster/ifs/projects/seqSim/bcbio-nextgen/bcbio/share/java/bcbio_variation/bcbio.variation-0.2.1-standalone.jar variant-ensemble /cluster/ifs/projects/seqSim/bcbio_testanalysis/NA12878-exome-eval/work_parallel/ensemble/NA12878-1/config/NA12878-1-ensemble.yaml /cluster/ifs/projects/seqSim/bcbio-nextgen/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa /cluster/ifs/projects/seqSim/bcbio_testanalysis/NA12878-exome-eval/work_parallel/ensemble/NA12878-1/NA12878-1-ensemble.vcf /cluster/ifs/projects/seqSim/bcbio_testanalysis/NA12878-exome-eval/work_parallel/gatk/NA12878-1-effects-ploidyfix-combined-gatkclean.vcf.gz /cluster/ifs/projects/seqSim/bcbio_testanalysis/NA12878-exome-eval/work_parallel/freebayes/NA12878-1-effects-ploidyfix-filter.vcf.gz /cluster/ifs/projects/seqSim/bcbio_testanalysis/NA12878-exome-eval/work_parallel/gatk-haplotype/NA12878-1-effects-ploidyfix-combined-gatkclean.vcf.gz

and here's the error:

Exception in thread "main" java.lang.IllegalArgumentException: Priority list [gatk, gatk, freebayes, gatk_haplotype] doesn't contain variant context gatk2 at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils$CompareByPriority.getIndex(GATKVariantContextUtils.java:2066) at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils$CompareByPriority.compare(GATKVariantContextUtils.java:2071) at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils$CompareByPriority.compare(GATKVariantContextUtils.java:2058) at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324) at java.util.TimSort.sort(TimSort.java:189) at java.util.TimSort.sort(TimSort.java:173) at java.util.Arrays.sort(Arrays.java:659) at java.util.Collections.sort(Collections.java:217) at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.sortVariantContextsByPriority(GATKVariantContextUtils.java:1494) at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.simpleMerge(GATKVariantContextUtils.java:877) at org.broadinstitute.gatk.tools.walkers.variantutils.CombineVariants.map(CombineVariants.java:309) at org.broadinstitute.gatk.tools.walkers.variantutils.CombineVariants.map(CombineVariants.java:117) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255) at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274) at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:314) at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at bcbio.run.broad$run_gatk$fn1805.invoke(broad.clj:34) at bcbio.run.broad$run_gatk.invoke(broad.clj:31) at bcbio.variation.combine$combine_variants.doInvoke(combine.clj:71) at clojure.lang.RestFn.invoke(RestFn.java:1557) at bcbio.variation.recall$get_min_merged.invoke(recall.clj:158) at bcbio.variation.recall$fn7040.invoke(recall.clj:173) at clojure.lang.MultiFn.invoke(MultiFn.java:249) at bcbio.variation.recall$create_merged$fn7045.invoke(recall.clj:187) at clojure.core$map$fn4207.invoke(core.clj:2487) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$map$fn4214.invoke(core.clj:2496) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$map$fn4207.invoke(core.clj:2479) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$map$fn4211.invoke(core.clj:2490) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$map$fn4207.invoke(core.clj:2479) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$map$fn4214.invoke(core.clj:2496) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$map$fn4207.invoke(core.clj:2479) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$reduce1.invoke(core.clj:890) at clojure.core$reverse.invoke(core.clj:904) at clojure.math.combinatorics$combinations.invoke(combinatorics.clj:73) at bcbio.variation.compare$variant_comparison_from_config$iter75827586$fn__7587.invoke(compare.clj:255) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$tree_seq$walk4647$fn4648.invoke(core.clj:4475) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.LazySeq.more(LazySeq.java:96) at clojure.lang.RT.more(RT.java:607) at clojure.core$rest.invoke(core.clj:73) at clojure.core$flatten.invoke(core.clj:6478) at bcbio.variation.compare$variant_comparison_from_config.invoke(compare.clj:254) at bcbio.variation.ensemble$consensus_calls.invoke(ensemble.clj:113) at bcbio.variation.ensemble$_main.doInvoke(ensemble.clj:133) at clojure.lang.RestFn.applyTo(RestFn.java:137) at clojure.core$apply.invoke(core.clj:617) at bcbio.variation.core$_main.doInvoke(core.clj:35) at clojure.lang.RestFn.applyTo(RestFn.java:137) at bcbio.variation.core.main(Unknown Source)

chapmanb commented 9 years ago

Nancy; Apologies about all the issues with that example. I'm running through that one locally as well to identify if there are any other issues so you don't have such a frustrating experience getting starting.

This issue was due to skipping the normalization step in ensemble preparation, which caused a problem when two inputs files have the same base name (gatk/NA12878-1-effects-ploidyfix-combined-gatkclean.vcf.gz and gatk-haplotype/NA12878-1-effects-ploidyfix-combined-gatkclean.vcf.gz) as the names would clash downstream. I pushed a fix which resolves this and should hopefully run cleanly now. If you update the tools to get the latest bcbio.variation version and restart the ensemble analysis:

bcbio_nextgen.py upgrade --tools
rm -rf ensemble

it will, fingers crossed, hopefully run smoothly now for you. Thanks again for all the patience getting this going.