brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
365 stars 56 forks source link

vcfanno fails to annotate variant(s) in my VCF present in ExAC VCF #3

Closed indapa closed 9 years ago

indapa commented 9 years ago

I am getting some strange behavior with vcfanno (the vcfanno_0.0.3_linux_386 executable). Its not annotating variants in my VCF that are present in the VCF in my conf.toml file.

I have several variants I pulled from Clinvar via NCBI eUtils API and put in VCF format. I wanted to see if they were segregating in ExAC. I normalized the ExAC VCF with vt and my VCF which I want to annotate with information from ExAC VCF.

My toml.conf file looks like:

file="/home/aindap/data/ExAC/ExAC.r0.3.sites.vep.vcf.normalized.vcf.gz"
fields = ["AF"]
ops=["first"]

Then I ran vcfanno:

$HOME/software/vcfanno_0.0.3_linux_386/vcfanno conf.toml my.variants.20150527.normalized.vcf > my.normalized.exac_maf.vcf

There several variants that I expected to be annotate with AF info from ExAC, but the resulting vcf has several records that are in ExAC, but not decorated with AF info. For example I had this record in my vcf:

#fileformat=VCFv4.0
##fileDate=20150505
##reference=GRCh37
##contig=<ID=1,length=249250621>
##contig=<ID=2,length=243199373>
##contig=<ID=3,length=198022430>
##contig=<ID=4,length=191154276>
##contig=<ID=5,length=180915260>
##contig=<ID=6,length=171115067>
##contig=<ID=7,length=159138663>
##contig=<ID=8,length=146364022>
##contig=<ID=9,length=141213431>
##contig=<ID=10,length=135534747>
##contig=<ID=11,length=135006516>
##contig=<ID=12,length=133851895>
##contig=<ID=13,length=115169878>
##contig=<ID=14,length=107349540>
##contig=<ID=15,length=102531392>
##contig=<ID=16,length=90354753>
##contig=<ID=17,length=81195210>
##contig=<ID=18,length=78077248>
##contig=<ID=19,length=59128983>
##contig=<ID=20,length=63025520>
##contig=<ID=21,length=48129895>
##contig=<ID=22,length=51304566>
##contig=<ID=X,length=155270560>
##contig=<ID=Y,length=59373566>
##contig=<ID=MT,length=16569>
##contig=<ID=HSCHR6_MHC_SSTO,length=4789211>
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=dbSNP,Number=1,Type=String,Description="dbSNP ID (i.e. rs number)">
##INFO=<ID=OMIM,Number=1,Type=String,Description="OMIM id">
##INFO=<ID=CLINSIG,Number=.,Type=String,Description="Variant Clinical Significance,ftp://ftp.ncbi.nlm.nih.gov/pub/GTR/standard_terms/Clinical_significance.txt">
##INFO=<ID=dbSNP,Number=1,Type=String,Description="dbSNP ID (i.e. rs number)">
##INFO=<ID=TRAIT,Number=1,Type=String,Description="disease trait">
##INFO=<ID=VT,Number=1,Type=String,Description="Variation Class">
##INFO=<ID=GENE,Number=1,Type=String,Description="Gene symbol(s) comma-delimited">
4       5577974 c.3025C>T       G       A       .       .       OMIM=607261.0007;dbSNP=rs137852927;GENE=EVC2;CLINSIG=Pathogenic;TRAIT="Chondroectodermaldysplasia";VT=singlenucleotidevariant

But its clearly in the ExAC vcf listed in my conf.toml file:

tabix /home/aindap/data/ExAC/ExAC.r0.3.sites.vep.vcf.normalized.vcf.gz 4:5577974-5577974

4   5577974 rs137852927 G   A   17524.2 PASS    AC=10;AC_AFR=0;AC_AMR=0;AC_Adj=10;AC_EAS=0;AC_FIN=0;AC_Het=10;AC_Hom=0;AC_NFE=10;AC_OTH=0;AC_SAS=0;AF=8.236e-05;AN=121412;AN_AFR=10404;AN_AMR=11578;AN_Adj=121400;AN_EAS=8654;AN_FIN=6614;AN_NFE=66730;AN_OTH=908;AN_SAS=16512;BaseQRankSum=-4.686;ClippingRankSum=-0.016;DB;DP=2164728;FS=0.591;GQ_MEAN=73.01;GQ_STDDEV=27.38;Het_AFR=0;Het_AMR=0;Het_EAS=0;Het_FIN=0;Het_NFE=10;Het_OTH=0;Het_SAS=0;Hom_AFR=0;Hom_AMR=0;Hom_EAS=0;Hom_FIN=0;Hom_NFE=0;Hom_OTH=0;Hom_SAS=0;InbreedingCoeff=-0.0001;MQ=59.68;MQ0=0;MQRankSum=0.596;NCC=0;QD=11.36;ReadPosRankSum=0.625;VQSLOD=2.42;culprit=MQRankSum;DP_HIST=0|6|6|2|7536|25144|13164|2809|1304|1202|1280|1383|1370|1208|1062|872|659|456|326|917,0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|9;GQ_HIST=0|0|0|0|7|2|3|1|1|2|1|1|34346|9449|2160|1768|751|272|350|11592,0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|10;CSQ=A|ENSG00000173040|ENST00000475313|Transcript|stop_gained&NMD_transcript_variant|3757|3025|1009|Q/*|Cag/Tag|rs137852927&CM024156|1||-1|EVC2|HGNC|19747|nonsense_mediated_decay|||ENSP00000431981|LBN_HUMAN|Q4W5A4_HUMAN|UPI000000D836|||18/23|||ENST00000475313.1:c.3025C>T|ENSP00000431981.1:p.Gln1009Ter||||||A:0|A:0.000233|pathogenic||||||||||,A|ENSG00000173040|ENST00000344938|Transcript|stop_gained|3319|3265|1089|Q/*|Cag/Tag|rs137852927&CM024156|1||-1|EVC2|HGNC|19747|protein_coding|||ENSP00000339954|LBN_HUMAN|Q4W5A4_HUMAN|UPI000021C4F1|||18/22|||ENST00000344938.1:c.3265C>T|ENSP00000339954.1:p.Gln1089Ter||||||A:0|A:0.000233|pathogenic|||||||POSITION:0.872761293771719&ANN_ORF:539.654&MAX_ORF:539.654|||HC,A|ENSG00000173040|ENST00000509670|Transcript|3_prime_UTR_variant&NMD_transcript_variant|3358|||||rs137852927&CM024156|1||-1|EVC2|HGNC|19747|nonsense_mediated_decay|||ENSP00000423876||Q4W5A4_HUMAN&E9PFT2_HUMAN|UPI0001D3B0F7|||19/23|||ENST00000509670.1:c.*1658C>T|||||||A:0|A:0.000233|pathogenic||||||||||,A|ENSG00000173040|ENST00000310917|Transcript|stop_gained|3757|3025|1009|Q/*|Cag/Tag|rs137852927&CM024156|1||-1|EVC2|HGNC|19747|protein_coding||CCDS54718.1|ENSP00000311683|LBN_HUMAN|Q4W5B1_HUMAN&Q4W5A4_HUMAN|UPI000006DE35|||18/22|||ENST00000310917.2:c.3025C>T|ENSP00000311683.2:p.Gln1009Ter||||||A:0|A:0.000233|pathogenic|||||||POSITION:0.820450230539734&ANN_ORF:539.654&MAX_ORF:539.654|||HC,A|ENSG00000173040|ENST00000344408|Transcript|stop_gained|3319|3265|1089|Q/*|Cag/Tag|rs137852927&CM024156|1||-1|EVC2|HGNC|19747|protein_coding|YES|CCDS3382.2|ENSP00000342144|LBN_HUMAN|Q4W5B1_HUMAN&Q4W5A4_HUMAN|UPI00001910B5|||18/22|||ENST00000344408.5:c.3265C>T|ENSP00000342144.5:p.Gln1089Ter||||||A:0|A:0.000233|pathogenic|||||||POSITION:0.831423478482302&ANN_ORF:539.654&MAX_ORF:539.654|||HC```

The program isn't performing as I expected. Do you have any debugging suggestions? I would really like to incorporate vcfanno in my pipeline. Thanks!

brentp commented 9 years ago

your pasted vcf is missing the CHROM header (which would give you an error) and your conf file is not correct. It should be something like:

[[annotation]]
file="b.vcf"
names=["TEST_AF"]
fields = ["AF"]
ops=["first"]

If you continue to have trouble, provide the full vcfs and the full conf file that you are using.

brentp commented 9 years ago

closing this due to fixes that came with adding natural sort.