Closed jdidion closed 1 year ago
I agree.
vcfdist currently outputs detailed per-variant, per-cluster, and per-phaseblock results in a custom CSV format.
I believe the best approach to supporting stratification would be to instead report results in hap.py
's Intermediate VCF File format (Supplementary Note 3 of "Best practices for benchmarking..."). That way, vcfeval could be used as a comparison engine within hap.py
, just like vcfeval
or xcmp
. This is what I plan to work on next; it shouldn't take too long.
Sounds like a great idea.
The only downside is that hap.py
most likely can't deal with partial positive variants. I may need to consider them false positives for interoperability. The majority of vcfdist's improvement came from standardizing the variant representation and enforcing phasing, so this shouldn't impact results too much, although it's certainly not ideal.
In the meantime, one option would be to run bedtools intersect
of your high-confidence BED and each stratification region, since vcfdist currently accepts a single BED file for region selection.
GIAB provides stratification bed files and hap.py gives the option to produce stratified results based on these files. It's very useful to be able to look at benchmarking performance by type of region. It would be great if vcfdist could support this option as well.