Closed davetang closed 8 years ago
did you run vt decompose on exome.vt.vep.vcf.gz
Yep.
vt decompose -s $VCF | vt normalize -r $GENOME - | gzip > $BASE.vt.vcf.gz
Thanks for the full example! I'll have a look at how to handle this better. Note that for this case. VQSLOD is type=Float,Number=1, and so it violates the VCF spec by having 4.5487,4.5487
in that field.
No problems. But what I was initially reporting is that 4.5487,4.5487
doesn't exist in the file.
zcat UK10K_COHORT.20140722.sites.vt.vep.vcf.gz | grep "4.5487,4.5487"
# returns nothing
In my first example, 0.021,0.058
and -0.0946,-1.352
aren't found in the VCF file either. I also checked all the fields in that VCF file and they all have single values.
understood. I'll have a look this week. I tagged this for 0.18.3 so we'll have a fix soon.
for my own record, I'm testing on:
wget -O -$url \
| zcat - | head -100000 \
| vt decompose -s - \
| vt normalize -r /data/human/hs37d5.fa - \
| grep -v "ID=CSQ" | perl -pe 's/CSQ[^;]+//' \
| bgzip -c > uk100k.vcf.gz
then vep annotate then :
tabix uk100k.vep.vcf.gz
gemini load --cores 4 -v uk100k.vep.vcf.gz -t VEP --no-genotypes uk100k.db
annotate:
gemini annotate -f uk100k.vep.vcf.gz -o list -e VQSLOD -t float uk100k.db
There are some interesting things in that VCF. For example:
1 2942599 . A ATGG
occurs twice but with different values in the info field.
@davetang I just pushed a fix for this that will be out in the next release. You can get it meanwhile with:
gemini_pip install git+https://github.com/arq5x/gemini.git
When I manually checked the variant at line 100001 in the VCF file, the AF has a single value: ;AF=0.208; and so does variant 100002: ;AF=0.34;. The AF of variant 100000 is stored correctly in the database, ;AF=0.118;. I'm not sure where the 0.021,0.058 came from. It was just a bit suspicious that the error occurred on line 100,001.
I'm getting the same error with VQSLOD.