brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
357 stars 55 forks source link

CADD score not visible in the vcf annotated #137

Closed SophiaMurat closed 3 years ago

SophiaMurat commented 3 years ago

Hello,

Thank you for this nice tool. I am trying to annotate my vcf with CADD using your tool. When I launch the vcfanno tool, a file is created. In this new vcf file a line was added in the header :

##INFO=<ID=PHRED,Number=1,Type=Float,Description="calculated by mean of overlapping values in column 6 from /data/users/smuratel/cadd/CADD_v1.6_hg38_whole_genome_SNVs.tsv.gz>

But when I look at the info field I do not find the CADD score.Could you tell me if it is normal. If yes, where is the CADD score?

Here are some infos on my setup:

My cadd.conf file:

[[annotation]]
file="/data/users/smuratel/cadd/CADD_v1.6_hg38_whole_genome_SNVs.tsv.gz"
names=["PHRED"]
ops=["mean"]
columns=[6]

Here the first lines of the CADD file : CADD_v1.6_hg38_whole_genome_SNVs.tsv.gz

## CADD GRCh38-v1.6 (c) University of Washington, Hudson-Alpha Institute for Biotechnology and Berlin Institute of Health 2013-2020. All rights reserved
#Chrom  Pos     Ref     Alt     RawScore        PHRED
1       10001   T       A       0.702541        8.478
1       10001   T       C       0.750954        8.921
1       10001   T       G       0.719549        8.634
1       10002   A       C       0.713993        8.583
1       10002   A       G       0.743661        8.854
1       10002   A       T       0.700507        8.460
1       10003   A       C       0.714485        8.588
1       10003   A       G       0.744152        8.859
1       10003   A       T       0.700999        8.464

Here one line from my initial vcf file : chr2 47783349 . G A 1916.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.908;DP=218;ExcessHet=3.0103;FS=1.077;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=9.40;ReadPosRankSum=-0.676;SOR=0.780 GT:AD:DP:GQ:PL 0/1:104,100:204:99:1924,0,2227

Here my command line : vcfanno -p 2 cadd.conf test.vcf > test.anno.vcf

and here the message from vcf anno :

=============================================
vcfanno version 0.3.2 [built with go1.12.1]

see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:115: found 1 sources from 1 files
vcfanno.go:156: falling back to non-bgzip
vcfanno.go:248: annotated 28 variants in 0.41 seconds (68.7 / second)

Thank you in advance for your answer.

Best regards,

Sophia

brentp commented 3 years ago

Hi, I would re-index your CADD file with tabix then check if you can do queries on it from the command line, like:

# index
tabix /data/users/smuratel/cadd/CADD_v1.6_hg38_whole_genome_SNVs.tsv.gz
# query
tabix /data/users/smuratel/cadd/CADD_v1.6_hg38_whole_genome_SNVs.tsv.gz chr2:47783349-47783350

if that is working then vcfanno should work as well. ( you should see, e.g. PHRED=1.23 added to your INFO field)

SophiaMurat commented 3 years ago

Thank you for your quick answer! It solved my problem. Everything is now working. Best, Sophia