chapmanb / bcbio.variation

Toolkit to analyze genomic variation data, built on the GATK with Clojure
66 stars 15 forks source link

Handling of multiallelic sites #16

Closed stsmall closed 10 years ago

stsmall commented 10 years ago

Hi Brad, I encountered an error using the 1.6 version of ensemble. Basically if I have multiallelic sites in my vcf, in this case from freebayes, ensemble takes the first alt allele but then keeps the PL field from the multiallele. This then creates an error when the PL field is read since there are now an incorrect number of values associated with that allele call. Any ideas on how to avoid this error without having to remove all multiallelic sites from the vcfs pre ensemble? Thanks for a great program! -scott

chapmanb commented 10 years ago

Scott; Thanks much for the report. This is a bug in how we combine VCFs into the final call. Would you be able to post a minimal version that exhibits the problem? If so, I'll work on a proper fix. Thanks again

stsmall commented 10 years ago

Hi Brad, Here is a single instance where Freebayes called multiple alleles while the other two callers (Haplotype Caller and UnifiedGenotyper) didn't call the allele. When I used ensemble it merged the calls but didn't correctly handle the multiple alleles.

freebayes

PairedContig_1566 4318 . G A,T 2074.55 PASS AB=0.380435,0.190217;ABP=25.858,156.382;AC=1,0;AF=0.5,0;AN=2;AO=70,35;CIGAR=1X,1X;DP=184;DPB=184;DPRA=0,0;EPP=57.7314,79.0118;EPPR=9.19487;GTI=0;LEN=1,1;MEANALT=2,2;MQM=39.7714,42;MQMR=42;NS=1;NUMALT=2;ODDS=314.561;PAIRED=0.0571429,0;PAIREDR=0.101266;PAO=0,0;PQA=0,0;PQR=0;PRO=0;QA=2593,1269;QR=2955;RO=79;RPP=86.8912,79.0118;RPPR=9.19487;RUN=1,1;SAF=19,35;SAP=34.7758,79.0118;SAR=51,0;SRF=65;SRP=74.504;SRR=14;TYPE=snp,snp;technology.ILLUMINA=1,1 GT:GQ:DP:RO:QR:AO:QA:GL 0/1:0:184:79:2955:70,35:2593,1269:-10,0,-10,-10,-10,-10

ensemble

PairedContig_1566 4318 . G A 2074.55 PASS AB=0.380435,0.190217;ABP=25.858,156.382;AC=1;AF=0.5;AN=2;AO=70,35;CFILTERS=None;CIGAR=1X,1X;DP=184;DPB=184;DPRA=0,0;EPP=57.7314,79.0118;EPPR=9.19487;GTI=0;LEN=1,1;MEANALT=2,2;MQM=39.7714,42;MQMR=42;NS=1;NUMALT=2;ODDS=314.561;PAIRED=0.0571429,0;PAIREDR=0.101266;PAO=0,0;PQA=0,0;PQR=0;PRO=0;QA=2593,1269;QR=2955;RO=79;RPP=86.8912,79.0118;RPPR=9.19487;RUN=1,1;SAF=19,35;SAP=34.7758,79.0118;SAR=51,0;SRF=65;SRP=74.504;SRR=14;TYPE=snp,snp;set=freebayes;technology.ILLUMINA=1,1 GT:DP:GQ:PL 0/1:184:0:100,0,100,100,100,100

Thanks! -Scott

chapmanb commented 10 years ago

Scott; Thanks much for the example. I modified one of the test cases to match and pushed a fix that drops PLs when they don't match the final alleles in the combined variant. This should hopefully resolve the issue and the new snapshot has this fix:

https://github.com/chapmanb/bcbio.variation/releases/download/v0.1.7-SNAPSHOT-20140528/bcbio.variation-0.1.7-SNAPSHOT-standalone.jar

Thank you again and please let us know if you run into other issues.