Open vdauwera opened 7 years ago
Would be awesome to implement this in a way that allows running the annotator again with a new resource callset to update the set annotation.
The one limitation of this approach I can think of, compared to the combineVariants functionality, is that it is callset-centric, ie it won't tell us what is present in the resources (and potentially common to multiple resources) but not in our input callset. But I can live with that as long as it is well documented.
A user notes rightly that it would be more convenient to have the sets annotated using a comma-separated list instead of a single string with dashes, so they can be parsed more readily.
Adding to annotation epic #3274
Feature request
Since CombineVariants will not be ported, we need equivalent functionality to its ability to annotate "set", ie which callset(s) a site is present in. Here is an excerpt from a tutorial that describes this functionality in action:
To find out which set each variant belongs to, we can use CombineVariants. CombineVariants has a way to annotate each site with which set the site belongs to. For example, if a site is in GIAB and failed hard filtering but passed VQSR, CombineVariants will annotate the site with set=G-filterInH-V. The "filterIn" flag before the filtering method tells us the site failed the filtering method, hence it was "filtered" in the set.
The set-annotated VCF looks like this:
In this record, "set=Intersection" indicates this record was present and unfiltered in all callsets considered.
Here is a key of all the possible combinations for this 3-way venn: