brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
365 stars 56 forks source link

Keep annotation source field Description #43

Closed xuyangy closed 7 years ago

xuyangy commented 7 years ago

When OP is "self" or "first", is it possible to keep the annotation source's field Description instead of a message like "transfered from matched variants in..."

brentp commented 7 years ago

I understand the desire to have this, but I think it would introduce more problems than it solves. What if you're grabbing allele frequency from 4 separate files and all of them say Description="Allele Frequency"?

xuyangy commented 7 years ago

That's a nice point! How about concatenating the original description and the source file path?

sveint commented 7 years ago

(I'm a colleague of xuyangy)

What if you're grabbing allele frequency from 4 separate files and all of them say Description="Allele Frequency"?

I probably misunderstand something, but I'd argue the current way is worse than the alternative. If you import allele frequencies from different files they would hopefully have different keys, often giving clues as to where they come from. Or you would rename the keys in postprocessing. In the current case vcfanno is throwing away data from the source files, which is not ideal in my opinion.

One problem we have is that several tools we use (VEP, snpEff for instance) keep their explanations of the data structure in the vcf INFO description. This is not ideal of course, but we have to rely on this to be able to parse the data back correctly when we use the vcf files.

Due to this we have to maintain our own fork of vcfanno for our internal use, but we're hoping that we can find some kind of solution for this. There is a SOURCE field for INFO in vcf 4.2 specfication, is that an option? Or is it a problem that it's 4.2 only?

brentp commented 7 years ago

well a pull request always speaks more convincingly than a bug report. can you open a PR with the changes you've made and we'll discuss from there? I agree in general with your justification, but I think we should still include the "from source-file.vcf" in the description (or, I guess Source if that's available.)

brentp commented 7 years ago

@xuyangy , @sveint can you point me to your fork so I have some insight on what you need?

xuyangy commented 7 years ago

@brentp Sure. Currently we keep only the original Description. I will change this further to include both the original Field Description and the "from /path/to/source-file.vcf" message and make a pull request.

brentp commented 7 years ago

any update on this? I'm happy to take a pull request.

brentp commented 7 years ago

@xuyangy can you point me to your change and I'll see what I can do about incorporating.

xuyangy commented 7 years ago

@brentp so sorry for not making changes yet I made changes to bix https://github.com/xuyangy/bix/commit/d9944ded48976b62d083b1678fc94a143bebb6f8 and vcfanno https://github.com/xuyangy/vcfanno/commit/25981cf63f35dc3844173a93dbfca1502be037c5 https://github.com/xuyangy/vcfanno/commit/bf0d8328c2667c45467e3e1538e0a489e8e38318

brentp commented 7 years ago

ok. I'll get this into the next release.

brentp commented 7 years ago

I made this change. Thanks so much for the implementation and for the suggestion. -Brent