brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
357 stars 55 forks source link

Missing annotation because of alignment problem. #146

Closed AJ2802 closed 2 years ago

AJ2802 commented 2 years ago

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

<body link="#0563C1" vlink="#954F72"

Vcfanno cannot annotate CGSCADDPHRED and AC_CGSPOPMAX to the following call correctly in the info field.

1 65141676 . TTAATAATAATAATAA TTAATAATAATAA,T 401.2 PASS CGSCADDPHRED=0.6 GT:AD:AF:DP:GQ:FT:F1R2:F2R1:PL:GP:PP 1/2:0,13,10:0.565,0.435:23:48:PASS:0,5,5:0,8,5:331,367,49,282,0,52:327.14,364.77,49.541,280.15,6.9893e-05,53:331,367,49,282,0,52

The correct annotation based on the database below should be 1 65141676 . TTAATAATAATAATAA TTAATAATAATAA,T 401.2 PASS CGSCADDPHRED=0.8,0.6;AC_CGSPOPMAX=2930,540 GT:AD:AF:DP:GQ:FT:F1R2:F2R1:PL:GP:PP 1/2:0,13,10:0.565,0.435:23:48:PASS:0,5,5:0,8,5:331,367,49,282,0,52:327.14,364.77,49.541,280.15,6.9893e-05,53:331,367,49,282,0,52

CGSCADDPHRED comes from the database. 1 65141676 . TTAATAATAATAATAA T 1 PASS CGSCADDPHRED=0.6 1 65141688 . ATAA A 1 PASS CGSCADDPHRED=0.8

AC_CGSPOPMAX comes from the database. 1 65141676 rs1302072381 TTAATAATAATAATAATAATAATAATAA TTAATAATAATAATAATAATAATAA,TTAATAATAATAATAATAATAA,TTAATAATAATAATAATAA,TTAATAATAATAATAA,TTAATAATAATAA,TTAATAA,T 1.90165e+07 PASS AC_CGSPOPMAX=2930,6389,1588,67,540,1040,962;

I find that in order to annotate CGSCADDPHRED to the 2nd alt allele of the call correctly, Vcfanno need to know to search at the position 65141676 + len('TTAATAATAATAA') - 1 at chrom 1.

In order to annotate AC_CGSPOPMAX correctly, Vcfanno need to know that TTAATAATAATAATAA -> TTAATAATAATAA, T is corresponding to TTAATAATAATAATAATAATAATAATAA ->TTAATAATAATAATAATAATAATAA, TAATAATAATAA.

It seems that it is a problem of extending a haplotype and alignment. What is your thought? Thank you

brentp commented 2 years ago

what config did you use?

AJ2802 commented 2 years ago

It is my config.toml

[[annotation]] file="/home/s0288764/custom_annotation/test_data/CGSCADDPHRED.database.vcf.gz" fields = ["CGSCADDPHRED"] ops=["self"]

[[annotation]] file="/home/s0288764/custom_annotation/test_data/AC_CGSPOPMAX.database.vcf.gz" fields = [" "AC_CGSPOPMAX"] ops=["by_alt"]

brentp commented 2 years ago

oh. i see. vcfanno does not normalize variants! it has to be an exact match on position and allele. this is a known and intentional feature. in order to annotate as you like, you'd have to decompose and normalize your variants.

AJ2802 commented 2 years ago

Thank you brentp for your suggestion. I can see a roadmap to do what I need with vcfanno. First, decompose and normalize both database and query vcf. Then use vcfanno to annotate a feature. Finally merge split calls in query vcf.

brentp commented 2 years ago

Yes, that's the way. vcfanno just can't do normalization.