PharmGKB / PharmCAT

The Pharmacogenomic Clinical Annotation Tool
Mozilla Public License 2.0
120 stars 39 forks source link

parsing error: QUAL 'inf' is not a number #147

Closed Dafnaa closed 12 months ago

Dafnaa commented 1 year ago

chr1 97306195 rs145548112 C T . . PX=DPYD;DP=190;MQ=59.94 GT:DP:RGQ 0/0:43:99 chr1 97373598 rs137999090 C T . . PX=DPYD;DP=172;MQ=59.88 GT:DP:RGQ 0/0:73:99 chr1 97373629 rs138545885 C A . . PX=DPYD;DP=178;MQ=59.95 GT:DP:RGQ 0/0:73:99 chr1 97382461 rs55971861 T G . . PX=DPYD;DP=48;MQ=59.46 GT:DP:RGQ 0/0:33:99 chr1 97450058 rs3918290 C T . . PX=DPYD;DP=153;MQ=59.78 GT:DP:RGQ 0/0:45:99 chr1 97450059 rs3918289 G C . . PX=DPYD;DP=153;MQ=59.78 GT:DP:RGQ 0/0:45:99 chr1 97450065 . T . . PCATxINDEL DP=158;MQ=59.78 GT:DP:RGQ 0/0:45:99 chr1 97450068 rs17376848 A G . . PX=DPYD;DP=163;MQ=59.79 GT:DP:RGQ 0/0:45:99 chr1 97450168 rs147601618 A G . . PX=DPYD;DP=175;MQ=59.85 GT:DP:RGQ 0/0:45:99 chr1 97450187 rs145773863 C T . . PX=DPYD;DP=170;MQ=59.94 GT:DP:RGQ 0/0:78:99 chr1 97450189 rs138616379 C T . . PX=DPYD;DP=170;MQ=59.78 GT:DP:RGQ 0/0:78:99 chr1 97450190 rs59086055 G A . . PX=DPYD;DP=170;MQ=59.78 GT:DP:RGQ 0/0:78:99 chr1 97515784 rs201615754 C A . . PX=DPYD;DP=120;MQ=60 GT:DP:RGQ 0/0:38:99 chr1 97515787 rs55886062 A C . . PX=DPYD;DP=121;MQ=60 GT:DP:RGQ 0/0:38:99 chr1 97515839 rs1801159 T C 1532.64 . PX=DPYD;AC=1;AF=0.5;AN=2;BaseQRankSum=-0.323;DB;DP=139;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=16.48;ReadPosRankSum=-3.137;SOR=0.665 GT:AD:DP:GQ:PL 0/1:46,47:93:99:1540,0,1540 chr1 97515851 rs142619737 C T . . PX=DPYD;DP=141;MQ=59.99 GT:DP:RGQ 0/0:89:99 chr1 97515865 rs1801158 C T 1552.64 . PX=DPYD;AC=1;AF=0.5;AN=2;BaseQRankSum=0.275;DB;DP=142;ExcessHet=0;FS=1.38;MLEAC=1;MLEAF=0.5;MQ=59.99;MQRankSum=0.993;QD=17.06;ReadPosRankSum=0.866;SOR=0.558 GT:AD:DP:GQ:PL 0/1:44,47:91:99:1560,0,1403 chr1 97515889 rs190951787 G C . . PX=DPYD;DP=130;MQ=59.98 GT:DP:RGQ 0/0:84:99 chr1 97515923 rs148994843 C T . . PX=DPYD;DP=125;MQ=59.98 GT:DP:RGQ 0/0:53:99

chr1 97593322 rs183385770 C T . . PX=DPYD;DP=172;MQ=59.91 GT:DP:RGQ 0/0:40:99 chr1 97593343 rs72549306 C A . . PX=DPYD;DP=172;MQ=59.91 GT:DP:RGQ 0/0:40:99 chr1 97593379 rs201018345 C T . . PX=DPYD;DP=159;MQ=59.93 GT:DP:RGQ 0/0:40:99 chr1 97595083 rs145112791 G A . . PX=DPYD;DP=99;MQ=60 GT:DP:RGQ 0/0:37:99 chr1 97595088 rs150437414 A G inf . PX=DPYD;DP=99;MLEAC=.;MLEAF=.;MQ=60 GT:DP:RGQ 0/0:67:12 chr1 97595149 rs146356975 T C . . PX=DPYD;DP=85;MQ=60 GT:DP:RGQ 0/0:65:99 chr1 97679170 rs45589337 T C . . PX=DPYD;DP=48;MQ=60 GT:DP:RGQ 0/0:46:99

chr1 97699533 rs139834141 C T . . PX=DPYD;DP=180;MQ=59.9 GT:DP:RGQ 0/0:45:99 chr1 97699535 rs2297595 T C . . PX=DPYD;DP=180;MQ=59.9 GT:DP:RGQ 0/0:45:99 chr1 97721542 rs200562975 T C . . PX=DPYD;DP=191;MQ=59.74 GT:DP:RGQ 0/0:49:99 chr1 97721650 rs141462178 T C . . PX=DPYD;DP=134;MQ=59.82 GT:DP:RGQ 0/0:49:99 chr1 97740400 rs150385342 C T . . PX=DPYD;DP=141;MQ=59.96 GT:DP:RGQ 0/0:75:99 chr1 97740410 . G . . PCATxINDEL DP=146;MQ=59.96 GT:DP:RGQ 0/0:75:99 chr1 97883329 rs1801265 A G . . PX=DPYD;DP=118;MQ=60 GT:DP:RGQ 0/0:50:99

Dafnaa commented 1 year ago

Oh I now saw that the vcf I used as an input for the preprocessor has infinity there: chr1 97595088 . A . Infinity . DP=99;MLEAC=.;MLEAF=.;MQ=60.00 GT:DP:RGQ 0/0:67:12

Can anyone help me how to deal with this?

muhligs commented 1 year ago

Hi, This is the HaplotypeCaller sometimes returning infinity in the QUAL col. It should be picked up and corrected by the pre processor, I think. Meanwhile, here is my workaround of replacing inf with 1000 :

vcf=path/to/vcf/vcfname_preprocessed.vcf.bgz
zcat ${vcf} | sed 's/\tinf/\t1000/' > inf_removed.vcf
java -jar /faststorage/project/pharmgene/pharmcat/software/pharmcat-2.2.3-all.jar -vcf inf_removed.vcf -bf outputprefix -o .

Best, Morten

Dafnaa commented 1 year ago

Thank you so much! I will use the workaround for now! :) Best wishes, Dafna

BinglanLi commented 1 year ago

@Dafnaa thank you for sharing the issue. The temporary solution offered by @muhligs will be the most convenient and straightforward fix.

PharmCAT and VCF Preprocessor is designed not to alter any info in the input VCF file. And based on the VCF spec > v4.2, the QUAL column should be either a missing value . or a numeric number. We strongly recommend reaching out to the GATK HaplotypeCaller dev team and their community forum for a proper fix of the HaplotypeCaller-generated VCF file format.

BinglanLi commented 12 months ago

Closing the issue since it's a VCF header issue that should be addressed by the GATK HaplotypeCaller.