bcbio / bcbio.variation.recall

Parallel merging, squaring off and ensemble calling for genomic variants
20 stars 3 forks source link

Ensembl varscan and somaticsniper #21

Open bioinfo-dirty-jobs opened 6 years ago

bioinfo-dirty-jobs commented 6 years ago

If I delete the DP4 and I use only varscan and somaticniper I have this error:

 /opt/bcbio.variation.recall/bcbio-variation-recall ensemble  -c 4 -n 1 out.fg.gz  /home/maurizio/database/hg19_primary.fa  VT/436.varscan.all.Somatic.vt.clean.vcf.gz  tmp.sort.vcf.gz  
2018-Feb-01 21:01:25 +0100 Tardis ERROR [bcbio.variation.recall.main] - 
java.lang.Exception: Problem retrieving reference variant for {:chr "chr1", :start 13416, :refa "C", :alta ["CGAGA"], :end 13421, :vc-indices (0)}: []
   bcbio.variation.ensemble.intersect/get-rep-vc/fn 
Somatic.vt.clean.vcf.gz  tmp.sort.vcf.gz  
2018-Feb-01 21:01:25 +0100 Tardis ERROR [bcbio.variation.recall.main] - 
java.lang.Exception: Problem retrieving reference variant for {:chr "chr1", :start 13416, :refa "C", :alta ["CGAGA"], :end 13421, :vc-indices (0)}: []
   bcbio.variation.ensemble.intersect/get-rep-vc/fn       intersect.clj:   58
                               clojure.core/comp/fn            core.clj: 2438
                                clojure.core/map/fn            core.clj: 2624
                          clojure.lang.LazySeq.sval        LazySeq.java:   40
                           clojure.lang.LazySeq.seq        LazySeq.java:   49
                                clojure.lang.RT.seq             RT.java:  507
                                   clojure.core/seq            core.clj:  137
                                clojure.core/map/fn            core.clj: 2616
                          clojure.lang.LazySeq.sval        LazySeq.java:   40
                           clojure.lang.LazySeq.seq        LazySeq.java:   49
                                clojure.lang.RT.seq             RT.java:  507
                                   clojure.core/seq            core.clj:  137
bcbio.variation.variantcontext/write-vcf-w-template  variantcontext.clj:  189
                         clojure.lang.RestFn.invoke         RestFn.java:  573
bcbio.variation.ensemble.intersect/ensemble-vcfs/fn       intersect.clj:   88
   bcbio.variation.ensemble.intersect/ensemble-vcfs       intersect.clj:   86
           bcbio.variation.ensemble.intersect/-main       intersect.clj:  140
                        clojure.lang.RestFn.applyTo         RestFn.java:  137
                                 clojure.core/apply            core.clj:  630
               bcbio.variation.recall.main/-main/fn            main.clj:   34
                  bcbio.variation.recall.main/-main            main.clj:   33
                        clojure.lang.RestFn.applyTo         RestFn.java:  137
                   bcbio.variation.recall.main.main                    :     
chapmanb commented 6 years ago

Thanks for all these reports and apologies about the issue. I'm moving all the discussion to this thread to keep it in one place since these all revolve around an issue with retrieving the reference allele for a variant call. I'm not totally sure what is going on, but it appears as if we're generating an ensemble allele that we can't find back in one of the references. Would you be able to share the problem files (at least in the region chr1:13416) so I could try to replicate and debug what is happening?

Thanks much for the help debugging.

bioinfo-dirty-jobs commented 6 years ago

example_chr1.tar.gz here you have the files..thanks so much for your help

chapmanb commented 6 years ago

Thanks for passing along the files. Unfortunately I'm not able to replicate the errors you're seeing so I'll detail what I did to determine how it's different than how you're running. I had to cleanup varscan and somaticsniper output:

# manually edit varscan input file to add CHROM line (missing) and change AD field to String
# since FORMAT not correct

bcftools annotate -x FORMAT/DP4 somaticniper.vt.clean_small.vcf -O z -o somaticniper.vt.clean_small.nodp4.vcf.gz

Then I ran ensemble calling which finished sucessfully:

bcbio-variation-recall ensemble -n 1 --names varscan,mutect2,somaticsniper out.vcf.gz /human/hg19/seq/hg19.fa.gz varscan.somatic.cleand.vcf mutect2.vt.vcf  somaticniper.vt.clean_small.nodp4.vcf.gz

Do you have additional steps which might be causing issues that I don't have in this processing. Thanks again for the help debugging.

bioinfo-dirty-jobs commented 6 years ago

thanks so much... I realize I use the vcf annotate using snpeff.