Open vdauwera opened 7 years ago
@yfarjoun I believe you always have opinions about validation, wdyt?
Nice idea. I think that GATK version of HC should be the same or else: warning.
In-fact, the headers can be further compared, to check for bands being equal for example.
annotations all the same..
Yeah I was thinking of same annotations too.
Assigning to @lbergelson as part of his GenotypeGVCFs work. This is a check that could be added after we tie-out that tool.
A user rightly points out that different versions of HaplotypeCaller may produce GVCFs that are not directly compatible, causing weirdness when you joint-genotype them with GenotypeGVCFs.
Obviously this is primarily a data management problem (user should control what's in their pipeline) -- but it would be good to provide an additional safety layer by having GenotypeGVCFs, CombineGVCFs or whatever demon is used to invoke TileDB at least emit a WARN message if they see GVCFs produced by different versions of HC within the same input cohort.
Note that the VCF version number is not directly useable for this purpose since changes in the contents of GVCFs can arise within the same version of VCF spec.
Also, one could argue that the GVCFs really should all be produced using exactly the same command line arguments -- but validating the entire command line would probably be overkill...