Closed Dafnaa closed 12 months ago
Oh I now saw that the vcf I used as an input for the preprocessor has infinity there: chr1 97595088 . A . Infinity . DP=99;MLEAC=.;MLEAF=.;MQ=60.00 GT:DP:RGQ 0/0:67:12
Can anyone help me how to deal with this?
Hi, This is the HaplotypeCaller sometimes returning infinity in the QUAL col. It should be picked up and corrected by the pre processor, I think. Meanwhile, here is my workaround of replacing inf with 1000 :
vcf=path/to/vcf/vcfname_preprocessed.vcf.bgz
zcat ${vcf} | sed 's/\tinf/\t1000/' > inf_removed.vcf
java -jar /faststorage/project/pharmgene/pharmcat/software/pharmcat-2.2.3-all.jar -vcf inf_removed.vcf -bf outputprefix -o .
Best, Morten
Thank you so much! I will use the workaround for now! :) Best wishes, Dafna
@Dafnaa thank you for sharing the issue. The temporary solution offered by @muhligs will be the most convenient and straightforward fix.
PharmCAT and VCF Preprocessor is designed not to alter any info in the input VCF file. And based on the VCF spec > v4.2, the QUAL column should be either a missing value .
or a numeric number. We strongly recommend reaching out to the GATK HaplotypeCaller dev team and their community forum for a proper fix of the HaplotypeCaller-generated VCF file format.
Closing the issue since it's a VCF header issue that should be addressed by the GATK HaplotypeCaller.
Do you want to request a feature or report a bug? bug??
What is the current behavior? The pre-processor works good and gives a vcf file. But when I want to run pharmcat for the report I will get a parsing error: [Line #135] Error parsing data: QUAL 'inf' is not a number at org.pharmgkb.parser.vcf.VcfParser.parseNextLine(VcfParser.java:213) at org.pharmgkb.parser.vcf.VcfParser.parse(VcfParser.java:133) at org.pharmgkb.pharmcat.haplotype.VcfReader.read(VcfReader.java:209) at org.pharmgkb.pharmcat.haplotype.VcfReader.(VcfReader.java:83)
at org.pharmgkb.pharmcat.VcfFile.getReader(VcfFile.java:89)
at org.pharmgkb.pharmcat.haplotype.NamedAlleleMatcher.call(NamedAlleleMatcher.java:178)
at org.pharmgkb.pharmcat.Pipeline.call(Pipeline.java:226)
at org.pharmgkb.pharmcat.PharmCAT.main(PharmCAT.java:166)
If the current behavior is a bug, please provide the steps to reproduce and, if possible, your example input data via a Gist or similar.
the input is the preprocessed vcf file partially shown here: where you can see inf in the QUAL column
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 14
chr1 97306195 rs145548112 C T . . PX=DPYD;DP=190;MQ=59.94 GT:DP:RGQ 0/0:43:99 chr1 97373598 rs137999090 C T . . PX=DPYD;DP=172;MQ=59.88 GT:DP:RGQ 0/0:73:99 chr1 97373629 rs138545885 C A . . PX=DPYD;DP=178;MQ=59.95 GT:DP:RGQ 0/0:73:99 chr1 97382461 rs55971861 T G . . PX=DPYD;DP=48;MQ=59.46 GT:DP:RGQ 0/0:33:99 chr1 97450058 rs3918290 C T . . PX=DPYD;DP=153;MQ=59.78 GT:DP:RGQ 0/0:45:99 chr1 97450059 rs3918289 G C . . PX=DPYD;DP=153;MQ=59.78 GT:DP:RGQ 0/0:45:99 chr1 97450065 . T . . PCATxINDEL DP=158;MQ=59.78 GT:DP:RGQ 0/0:45:99 chr1 97450068 rs17376848 A G . . PX=DPYD;DP=163;MQ=59.79 GT:DP:RGQ 0/0:45:99 chr1 97450168 rs147601618 A G . . PX=DPYD;DP=175;MQ=59.85 GT:DP:RGQ 0/0:45:99 chr1 97450187 rs145773863 C T . . PX=DPYD;DP=170;MQ=59.94 GT:DP:RGQ 0/0:78:99 chr1 97450189 rs138616379 C T . . PX=DPYD;DP=170;MQ=59.78 GT:DP:RGQ 0/0:78:99 chr1 97450190 rs59086055 G A . . PX=DPYD;DP=170;MQ=59.78 GT:DP:RGQ 0/0:78:99 chr1 97515784 rs201615754 C A . . PX=DPYD;DP=120;MQ=60 GT:DP:RGQ 0/0:38:99 chr1 97515787 rs55886062 A C . . PX=DPYD;DP=121;MQ=60 GT:DP:RGQ 0/0:38:99 chr1 97515839 rs1801159 T C 1532.64 . PX=DPYD;AC=1;AF=0.5;AN=2;BaseQRankSum=-0.323;DB;DP=139;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=16.48;ReadPosRankSum=-3.137;SOR=0.665 GT:AD:DP:GQ:PL 0/1:46,47:93:99:1540,0,1540 chr1 97515851 rs142619737 C T . . PX=DPYD;DP=141;MQ=59.99 GT:DP:RGQ 0/0:89:99 chr1 97515865 rs1801158 C T 1552.64 . PX=DPYD;AC=1;AF=0.5;AN=2;BaseQRankSum=0.275;DB;DP=142;ExcessHet=0;FS=1.38;MLEAC=1;MLEAF=0.5;MQ=59.99;MQRankSum=0.993;QD=17.06;ReadPosRankSum=0.866;SOR=0.558 GT:AD:DP:GQ:PL 0/1:44,47:91:99:1560,0,1403 chr1 97515889 rs190951787 G C . . PX=DPYD;DP=130;MQ=59.98 GT:DP:RGQ 0/0:84:99 chr1 97515923 rs148994843 C T . . PX=DPYD;DP=125;MQ=59.98 GT:DP:RGQ 0/0:53:99
chr1 97593322 rs183385770 C T . . PX=DPYD;DP=172;MQ=59.91 GT:DP:RGQ 0/0:40:99 chr1 97593343 rs72549306 C A . . PX=DPYD;DP=172;MQ=59.91 GT:DP:RGQ 0/0:40:99 chr1 97593379 rs201018345 C T . . PX=DPYD;DP=159;MQ=59.93 GT:DP:RGQ 0/0:40:99 chr1 97595083 rs145112791 G A . . PX=DPYD;DP=99;MQ=60 GT:DP:RGQ 0/0:37:99 chr1 97595088 rs150437414 A G inf . PX=DPYD;DP=99;MLEAC=.;MLEAF=.;MQ=60 GT:DP:RGQ 0/0:67:12 chr1 97595149 rs146356975 T C . . PX=DPYD;DP=85;MQ=60 GT:DP:RGQ 0/0:65:99 chr1 97679170 rs45589337 T C . . PX=DPYD;DP=48;MQ=60 GT:DP:RGQ 0/0:46:99
chr1 97699533 rs139834141 C T . . PX=DPYD;DP=180;MQ=59.9 GT:DP:RGQ 0/0:45:99 chr1 97699535 rs2297595 T C . . PX=DPYD;DP=180;MQ=59.9 GT:DP:RGQ 0/0:45:99 chr1 97721542 rs200562975 T C . . PX=DPYD;DP=191;MQ=59.74 GT:DP:RGQ 0/0:49:99 chr1 97721650 rs141462178 T C . . PX=DPYD;DP=134;MQ=59.82 GT:DP:RGQ 0/0:49:99 chr1 97740400 rs150385342 C T . . PX=DPYD;DP=141;MQ=59.96 GT:DP:RGQ 0/0:75:99 chr1 97740410 . G . . PCATxINDEL DP=146;MQ=59.96 GT:DP:RGQ 0/0:75:99 chr1 97883329 rs1801265 A G . . PX=DPYD;DP=118;MQ=60 GT:DP:RGQ 0/0:50:99
What is the expected behavior? I hope someone knows why this happens and can help me fix it and produce a html report If more information is needed let me know!
What is the motivation / use case for changing the behavior?
Please tell us about your environment:
Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, e.g. stackoverflow, gitter, etc.)