brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
364 stars 56 forks source link

Recognising reserved INFO fields #65

Open dancooke opened 7 years ago

dancooke commented 7 years ago

Hi, thanks for the great tool!

Would it be possible to recognise reserved annotation names (e.g. DB & SOMATIC) and use the description given in the VCF spec for such fields (id not already present)?

For example, if I just want to flag all records from dbSNP:

[[annotation]]
file="00-common_all.vcf.gz"
fields=["ID"]
ops=["flag"]
names=["DB"]

Now "DB" is a reserved INFO flag, so I'd quite like the header file to include this (if not already present):

##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership">

Apposed to:

##INFO=<ID=DB,Number=0,Type=Flag,Description="calculated by flag of overlapping values in field ID from 00-common_all.vcf.gz">

Dan

brentp commented 7 years ago

I'm hesitant to do this even though it seems sensible. Some people want customizable descriptions, you want specialized descriptions based on special fields from the VCF spec.

Both of these are quite reasonable, but require additional docs and could be surprising, e.g. what if someone uses names=["DB"] but are annotating with their in-house db, not as a flag, but using the name or something.

Let me think about this and/or let me know if you have thoughts. A key driver for me now is to not increase the complexity for the user (since they already have to create a conf file and understand annotation, postannotation, etc.).

dancooke commented 7 years ago

In truth I'd quite like customisable descriptions too. While it's fairly easy to change downstream, it does require re-writing a potentially very large file, and I don't think it introduced too much additional complexity (especially if it's an opt-in feature).

How does vcfanno currently handle conflicting field descriptions (e.g. if the field is already present in multiple source(s) and target VCF's)? I think some user control here would be nice in any case. For reserved fields, I think it's entirely reasonable to default to the descriptions in the spec if not already defined, and emit a warning if a conflicting description is provided.