chapmanb / bcbio.variation

Toolkit to analyze genomic variation data, built on the GATK with Clojure
66 stars 15 forks source link

Error with variant-utils comparetwo: Structural variant type not handled: :NON_REF #28

Open ragans opened 9 years ago

ragans commented 9 years ago

I am trying to use the variant-utils comparetwo function to compare two gvcf files from our GATK pipeline. I have tried several iterations of files including converting to vcf using gvcftools (extract_variants).

Our gvcf files are generated with the HaplotypeCaller from GATK 3.2.2 with --emitRefConfidence GVCF \

The call I am using with variant-utils is: time java $JAVA_OPTS \ -jar $BCBIO/bcbio.variation-0.2.4-standalone.jar variant-utils comparetwo $VCFEVAL $VCFCOMP $GENOMEREF $BED

where VCFEVAL and VCFCOMP are the eval and ref vcf's, respectively, the GENOMEREF is the reference used for alignment (BWA-mem) and BED is the bed file for the region(s) of interest.

The error I get is at the bottom of the post.

I am guessing that some form of user error is the culprit, but I haven't been able to self diagnose.

If you need to see log files etc., please let me know.

Kind regards,

Seamus Ragan

***_Error Message_****

Exception in thread "main" java.lang.Exception: Structural variant type not handled: :NON_REF at bcbio.variation.structural$get_sv_length.invoke(structural.clj:191) at bcbio.variation.structural$get_ci_start_end.doInvoke(structural.clj:208) at clojure.lang.RestFn.invoke(RestFn.java:425) at bcbio.variation.structural$parse_vcf_sv$updated_sv_vc2673.invoke(structural.clj:247) at clojure.core$keep$fn6349.invoke(core.clj:6603) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:67) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$filter$fn4226.invoke(core.clj:2523) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core.protocols$seq_reduce.invoke(protocols.clj:30) at clojure.core.protocols$fn6026.invoke(protocols.clj:54) at clojure.core.protocols$fn5979$G59745992.invoke(protocols.clj:13) at clojure.core$reduce.invoke(core.clj:6177) at bcbio.variation.structural$prep_itree.invoke(structural.clj:34) at bcbio.variation.structural$parse_vcf_sv.doInvoke(structural.clj:261) at clojure.lang.RestFn.invoke(RestFn.java:636) at bcbio.variation.structural$find_concordant_svs.invoke(structural.clj:275) at bcbio.variation.structural$compare_sv$fn2714.invoke(structural.clj:350) at bcbio.variation.structural$compare_sv.doInvoke(structural.clj:346) at clojure.lang.RestFn.invoke(RestFn.java:731) at bcbio.variation.structural$compare_sv_pipeline.invoke(structural.clj:371) at bcbio.variation.compare$compare_two_vcf.invoke(compare.clj:180) at bcbio.variation.compare$variant_comparison_from_config$iter81238127$fn8128$iter81478151$fn__8152.invoke(compare.clj:256) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core.protocols$seq_reduce.invoke(protocols.clj:30) at clojure.core.protocols$fn6026.invoke(protocols.clj:54) at clojure.core.protocols$fn5979$G59745992.invoke(protocols.clj:13) at clojure.core$reduce.invoke(core.clj:6177) at bcbio.variation.multiple$prep_cmp_name_lookup.doInvoke(multiple.clj:40) at clojure.lang.RestFn.invoke(RestFn.java:410) at bcbio.variation.compare$finalize_comparisons.invoke(compare.clj:233) at bcbio.variation.compare$variant_comparison_from_config$iter81238127$fn__8128.invoke(compare.clj:257) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$tree_seq$walk4647$fn4648.invoke(core.clj:4475) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.LazySeq.more(LazySeq.java:96) at clojure.lang.RT.more(RT.java:607) at clojure.core$rest.invoke(core.clj:73) at clojure.core$flatten.invoke(core.clj:6478) at bcbio.variation.compare$variant_comparison_from_config.invoke(compare.clj:254) at bcbio.variation.utils.comparetwo$run.invoke(comparetwo.clj:45) at bcbio.variation.utils.comparetwo$cl_entry$fn8336.invoke(comparetwo.clj:66) at bcbio.variation.utils.comparetwo$cl_entry.doInvoke(comparetwo.clj:65) at clojure.lang.RestFn.applyTo(RestFn.java:137) at clojure.core$apply.invoke(core.clj:617) at bcbio.variation.utils.core$_main.doInvoke(core.clj:40) at clojure.lang.RestFn.applyTo(RestFn.java:137) at clojure.core$apply.invoke(core.clj:617) at bcbio.variation.core$_main.doInvoke(core.clj:35) at clojure.lang.RestFn.applyTo(RestFn.java:137) at bcbio.variation.core.main(Unknown Source)

chapmanb commented 9 years ago

Seamus; Thanks for trying this out and sorry about the issues. Right now it doesn't handle gVCF files, which contain data for non-variant regions (labelled with the NON_REF variant). Generally you want to convert these gVCFs into standard VCFs using GenotypeGVCFs:

https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_GenotypeGVCFs.php

and bcbio.variation should handle them cleanly. I'm not as familiar with gvcftools but think that should work as well. When you did the conversion to VCF do you get the same error, or something else? It might be possible gvcftools doesn't handle Broad GATK gVCF correctly since all NON_REFs should be gone in the final VCF. Please let us know if you run into any issues with the standard VCFs.

ragans commented 9 years ago

Brad, Thanks for the pointer. That fixed the problem. gvcftools does not remove the .

I appreciate your help and your work on this and the rest of the bcbio toolsets.

Kind regards,

Seamus Ragan