exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
202 stars 55 forks source link

ERROR:unparsable vcf record with allele * #418

Closed Tttgb closed 2 years ago

Tttgb commented 2 years ago

I am a new user and meet some problem when I tried to analyze my vcf file. The exomiser can't recognize the allele. The ERROR: 2022-01-18 16:54:07.965 ERROR 24185 --- [ main] o.s.boot.SpringApplication : Application run failed java.lang.IllegalStateException: Failed to execute CommandLineRunner at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:794) at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:775) at org.springframework.boot.SpringApplication.run(SpringApplication.java:345) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1343) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1332) at org.monarchinitiative.exomiser.cli.Main.main(Main.java:52) Caused by: htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 38333: unparsable vcf record with allele *CA, for input source: file:///xtdisk/liuzq_group/guodan/data/Merged_vcf_by_family/famliy_2/famliy_2_merged.vcf at htsjdk.variant.vcf.AbstractVCFCodec.generateException(AbstractVCFCodec.java:887) at htsjdk.variant.vcf.AbstractVCFCodec.checkAllele(AbstractVCFCodec.java:678) at htsjdk.variant.vcf.AbstractVCFCodec.parseSingleAltAllele(AbstractVCFCodec.java:706) at htsjdk.variant.vcf.AbstractVCFCodec.parseAlleles(AbstractVCFCodec.java:648)

The vcf file in 38333 line: 38333 chr11 1961703 . TCACACA *CA,ACACACA,TCA,* 713.16 PASS AC=1,4,2,0;AF=0.500,0.500;AN=10;BaseQRankSum=-4.935e+00;DP=191;ExcessHet=3.0103;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQR 38333 ankSum=0.00;QD=18.00;ReadPosRankSum=-2.020e-01;SF=0,1,2,3,5;SOR=5.486;VQSLOD=6.05;culprit=FS GT:GQ:DP:PL:AD 0/3:99:31:1524,.,.,.,.,.,563,.,.,509,532,.,.,0,586:0,16,13 0/2:99:29:.,.,.,.,.,280,., 38333 .,.,.,.,.,.,.,.:11,16 2/2:78:29:.,.,.,.,.,0,.,.,.,.,.,.,.,.,.:1,27 1/2:99:28:.,.,577,.,0,700,.,.,.,.,.,.,.,.,.:0,14,13 . 0/3:99:20:.,.,.,.,.,.,.,.,.,281,.,.,.,.,.:7,11 How should I do to avoid this error? Thanks for your help

julesjacobsen commented 2 years ago

This is an htsjdk.tribble.TribbleException is a HTSJDK error. It says it doesn't like the *CA allele, which isn't valid VCF 4.3 - check the VCF specs and whatever you used to create your VCF file and ensure that the output is valid according to the VCF specification:

  1. ALT — alternate base(s): Comma-separated list of alternate non-reference alleles. These alleles do not have to be called in any of the samples. Options are base Strings made up of the bases A,C,G,T,N (case insensitive) or the ‘’ symbol (allele missing due to overlapping deletion) or a MISSING value ‘.’ (no variant) or an angle-bracketed ID String (“”) or a breakend replacement string as described in Section 5.4. If there are no alternative alleles, then the MISSING value must be used. In other words, the ALT field must be a symbolic allele, or a breakend replacement string, or match the regular expression ^([ACGTNacgtn]+|\|.)$. Tools processing VCF files are not required to preserve case in the allele String, except for IDs, which are case sensitive. (String; no whitespace, commas, or angle-brackets are permitted in the ID String itself)