chapmanb / bcbio.variation

Toolkit to analyze genomic variation data, built on the GATK with Clojure
66 stars 15 forks source link

Illegal base [ ] seen in the allele; seemingly random error #21

Closed stsmall closed 9 years ago

stsmall commented 9 years ago

Hi Brad, I am using version 0.1.9 standalone jar of bcbio.variation. I am also running java version "1.7.0_60" When I run bcbio.variation ensemble on 3 vcf files (GATK-UG, HC, Freebayes) I continuously get an error stating "Illegal base [ ] seen in the allele". I have checked the vcf but do not find anything out of the ordinary. Also the error seems to fall on different chr in the vcf file varying from run to run even when I just re-execute the previous statement. Let me know what extra information you would like. thank you for the great program and continued support, scott

Here is the truncated error: INFO 11:38:02,199 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 11:38:02,200 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 11:38:02,201 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime INFO 11:38:32,211 ProgressMeter - PairedContig_459:66 1442218.0 30.0 s 20.0 s 64.6% 46.0 s 16.0 s 0 variants were aligned INFO 11:38:41,796 ProgressMeter - done 1928195.0 39.0 s 20.0 s 100.0% 39.0 s 0.0 s INFO 11:38:41,796 ProgressMeter - Total runtime 39.60 secs, 0.66 min, 0.01 hours Exception in thread "main" java.lang.IllegalArgumentException: Illegal base [ ] seen in the allele at htsjdk.variant.variantcontext.Allele.create(Allele.java:208) at htsjdk.variant.variantcontext.Allele.create(Allele.java:314) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

chapmanb commented 9 years ago

Scott; Thanks for the report and sorry about the issue. I haven't seen this before and the inconsistent behavior is confusing. It looks like there is somehow a newline (the error message reports the offending character between the two brackets: https://github.com/samtools/htsjdk/blob/a30bc12df2aeb3b4312a9c236c6a639025d5b596/src/java/htsjdk/variant/variantcontext/Allele.java#L208) in place of one of the ref/alt alleles.

To debug further it would help to have the full stack trace so we can see exactly where the error comes in at. If you could reduce the error to a small repeatable example and pass the files along to me via e-mail I can also try to dig more. Sorry again about the problem and hope this helps some.

stsmall commented 9 years ago

Hi Brad, The command is to run ensemble on 3 joint call files: GATK-HapCaller, GATK-UnifiedGeno, Freebayes. The error seems to be when the freebayes file was being cleaned. I am currently rerunning ensemble v0.2.0 without freebayes to see if that is indeed the culprit file. I have attached the full error stack and the progress log. thanks again for the help and support of useful software! scott

The error seems to be with indel cleaning in Freebayes file. On 1/2/15 6:01 AM, Brad Chapman wrote:

Scott; Thanks for the report and sorry about the issue. I haven't seen this before and the inconsistent behavior is confusing. It looks like there is somehow a newline (the error message reports the offending character between the two brackets: https://github.com/samtools/htsjdk/blob/a30bc12df2aeb3b4312a9c236c6a639025d5b596/src/java/htsjdk/variant/variantcontext/Allele.java#L208) in place of one of the ref/alt alleles.

To debug further it would help to have the full stack trace so we can see exactly where the error comes in at. If you could reduce the error to a small repeatable example and pass the files along to me via e-mail I can also try to dig more. Sorry again about the problem and hope this helps some.

— Reply to this email directly or view it on GitHub https://github.com/chapmanb/bcbio.variation/issues/21#issuecomment-68519602.

Every gun that is made, every warship launched, every rocket fired signifies,in the final sense, a theft from those who hunger and are not fed, those who are cold and are not clothed. This world in arms is not spending money alone. It is spending the sweat of its laborers, genius of its scientists, the hopes of its children. --Dwight D. Eisenhower Processing Log: 2015-01-01T20:08:13 :: State :begin :: {:desc "Starting variation analysis"} 2015-01-01T20:08:13 :: State :clean :: {:desc "Cleaning input VCF: combo"} 2015-01-01T21:53:13 :: State :merge :: {:desc "Merging multiple input files: combo"} 2015-01-01T21:53:13 :: State :prep :: {:desc "Prepare VCF, resorting to genome build: combo"} 2015-01-01T21:59:26 :: State :normalize :: {:desc "Normalize MNP and indel variants: combo"} 2015-01-01T22:01:19 :: State :clean :: {:desc "Cleaning input VCF: gatk-hc"} 2015-01-01T23:42:09 :: State :merge :: {:desc "Merging multiple input files: gatk-hc"} 2015-01-01T23:42:09 :: State :prep :: {:desc "Prepare VCF, resorting to genome build: gatk-hc"} 2015-01-01T23:47:48 :: State :normalize :: {:desc "Normalize MNP and indel variants: gatk-hc"} 2015-01-01T23:49:30 :: State :clean :: {:desc "Cleaning input VCF: gatk-ug"} 2015-01-02T01:57:13 :: State :merge :: {:desc "Merging multiple input files: gatk-ug"} 2015-01-02T01:57:13 :: State :prep :: {:desc "Prepare VCF, resorting to genome build: gatk-ug"} 2015-01-02T02:03:16 :: State :normalize :: {:desc "Normalize MNP and indel variants: gatk-ug"} 2015-01-02T02:05:05 :: State :clean :: {:desc "Cleaning input VCF: freebayes"} 2015-01-02T04:56:08 :: State :merge :: {:desc "Merging multiple input files: freebayes"} 2015-01-02T04:56:08 :: State :prep :: {:desc "Prepare VCF, resorting to genome build: freebayes"} 2015-01-02T05:00:51 :: State :normalize :: {:desc "Normalize MNP and indel variants: freebayes"}

Full Stack: Exception in thread "main" java.lang.IllegalArgumentException: Illegal base [ ] seen in the allele at htsjdk.variant.variantcontext.Allele.create(Allele.java:208) at htsjdk.variant.variantcontext.Allele.create(Allele.java:314) at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) at clojure.lang.Reflector.invokeStaticMethod(Reflector.java:207) at bcbio.variation.complex$split_alleles$extract_variants960$fn964.invoke(complex.clj:121) at clojure.core$map_indexed$mapi6342$fn6343.invoke(core.clj:6580) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.LazySeq.first(LazySeq.java:82) at clojure.lang.RT.first(RT.java:577) at clojure.core$first.invoke(core.clj:55) at bcbio.variation.complex$split_alleles$extract_variants960.invoke(complex.clj:123) at bcbio.variation.complex$split_alleles$fn967.invoke(complex.clj:139) at bcbio.variation.complex$split_alleles.doInvoke(complex.clj:135) at clojure.lang.RestFn.invoke(RestFn.java:464) at bcbio.variation.complex$split_complex_indel.invoke(complex.clj:302) at bcbio.variation.complex$get_normalized_vcs$process_vc1093.invoke(complex.clj:380) at clojure.core$map$fn4207.invoke(core.clj:2487) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.Cons.next(Cons.java:39) at clojure.lang.RT.next(RT.java:598) at clojure.core$next.invoke(core.clj:64) at clojure.core$concat$cat3925$fn__3926.invoke(core.clj:694) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$take_while$fn4236.invoke(core.clj:2564) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.Cons.next(Cons.java:39) at clojure.lang.RT.countFrom(RT.java:540) at clojure.lang.RT.count(RT.java:530) at clojure.core$partition_by$fn6320.invoke(core.clj:6505) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$map$fn4207.invoke(core.clj:2479) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.Cons.next(Cons.java:39) at clojure.lang.RT.next(RT.java:598) at clojure.core$next.invoke(core.clj:64) at clojure.core$concat$cat3925$fn3926.invoke(core.clj:694) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$map$fn4207.invoke(core.clj:2479) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.Cons.next(Cons.java:39) at clojure.lang.RT.next(RT.java:598) at clojure.core$next.invoke(core.clj:64) at bcbio.variation.variantcontext$write_vcf_w_template.doInvoke(variantcontext.clj:189) at clojure.lang.RestFn.invoke(RestFn.java:470) at bcbio.variation.complex$normalize_variants$fn1110.invoke(complex.clj:418) at bcbio.variation.complex$normalize_variants.doInvoke(complex.clj:416) at clojure.lang.RestFn.invoke(RestFn.java:494) at bcbio.variation.combine$gatk_normalize.invoke(combine.clj:190) at bcbio.variation.compare$prepare_vcf_calls$fn7526.invoke(compare.clj:120) at clojure.core$map$fn4207.invoke(core.clj:2487) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.Cons.next(Cons.java:39) at clojure.lang.PersistentVector.create(PersistentVector.java:51) at clojure.lang.LazilyPersistentVector.create(LazilyPersistentVector.java:31) at clojure.core$vec.invoke(core.clj:354) at bcbio.variation.compare$prepare_vcf_calls.invoke(compare.clj:121) at bcbio.variation.compare$variant_comparison_from_config$iter75827586$fn__7587.invoke(compare.clj:255) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$tree_seq$walk4647$fn4648.invoke(core.clj:4475) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.LazySeq.more(LazySeq.java:96) at clojure.lang.RT.more(RT.java:607) at clojure.core$rest.invoke(core.clj:73) at clojure.core$flatten.invoke(core.clj:6478) at bcbio.variation.compare$variant_comparison_from_config.invoke(compare.clj:254) at bcbio.variation.ensemble$consensus_calls.invoke(ensemble.clj:113) at bcbio.variation.ensemble$_main.doInvoke(ensemble.clj:133) at clojure.lang.RestFn.applyTo(RestFn.java:137) at clojure.core$apply.invoke(core.clj:617) at bcbio.variation.core$_main.doInvoke(core.clj:35) at clojure.lang.RestFn.applyTo(RestFn.java:137) at bcbio.variation.core.main(Unknown Source)

chapmanb commented 9 years ago

Scott; Thanks much for the detailed traceback. It looks to be happening during the normalization step. We now do an upstream normalization as part of bcbio so are trying to rely less on the custom one in bcbio.variation. So, I dropped normalization from the ensemble process and am hopeful this'll let things successfully run on your inputs. I pushed a new 0.2.1 release that includes this. Thanks again for all the details and help debugging.