brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
357 stars 55 forks source link

Resolving ambiguity in multiallelic sites #155

Open e-271 opened 1 year ago

e-271 commented 1 year ago

I am using the 'by_alt' op, but have many multiallelic sites in my input vcf. For a position where some alleles are present in annotations and others are not, there is ambiguity about which alleles have annotations and which do not. For example if the alt allele 'A' is present in the annotation file and 'G' is not, vcfanno will produce the following: chr1 91246 1_91246_T_G T G,A 2896.0 . AC=9,1;PHRED=5.179;CADD=0.377863 GT:AD:DP:GQ:PL 0/1:16,21,12:52:85:681,0,494,644,85,1010 0/2:33,0,14:49:95:95,184,663,0,479,441

(I used SNP alt alleles for the example which generally will always be present in the CADD annotation files, but many sites in my input files have a mix of SNPs and indels so I am not sure how to resolve the ambiguity there).

I think this might be improved if vcfanno output a placeholder like '.' for alleles that do not have annotations. The above example would become: chr1 91246 1_91246_T_G T G,A 2896.0 . AC=9,1;PHRED=.,5.179;CADD=.,0.377863 GT:AD:DP:GQ:PL 0/1:16,21,12:52:85:681,0,494,644,85,1010 0/2:33,0,14:49:95:95,184,663,0,479,441

Here are my example files to replicate this. Thank you! testVcfannoMultiallele.tar.gz