brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
357 stars 55 forks source link

Interesting failure to include annotations from .vcf.bgz instead of .vcf.gz #90

Closed Phillip-a-richmond closed 6 years ago

Phillip-a-richmond commented 6 years ago

This is not an issue that I need addressed, but I debugged something that may be of interest.

I recently upgraded to the newer release of gnomAD and additionally added the exome set instead of the older EXAC release. However, when I did this, I noticed that my VCFs were no longer getting annotated correctly with gnomad variants. Looking into this further, the only thing that seemed to change was that before I was using gnomad.v1.vcf.gz, and the newer version was gnomad.v2.vcf.bgz. The header information was the same, and I normalized and split the files the same, so it shouldn't have been an issue.

I scratched my head for awhile, and then figured out that the difference was when VCFAnno sees .bgz, it does not correctly parse the annotation file. Simply renaming the files from .bgz to .gz fixed this issue.

I guess within VCFAnno you are parsing the filename, and if you see .vcf.bgz then you interpret it differently than .vcf.gz. I never use .bgz, but kept the convention that gnomAD had. It may be worth mentioning within VCFAnno that .bgz is not acceptable for compressed VCF files when users are trying to develop their own annotation pipelines.

Happy Monday, Phil

If you have encountered an error, please include:

brentp commented 6 years ago

yea. I should fix this. This is a dup of #81 so I'll close it. and I will try to get this into the next release.