Closed jzook closed 7 years ago
Just to follow up on this, @pkrusche had suggested creating a simple human and machine readable file that gives information about truth sets and their locations. I'm thinking the following columns might be useful, and interested in other's suggestions:
Does a tab-delimited file seem best to everyone for this?
Maybe also a version number? Otherwise looks good!
PR #21 should address this.
@marghoob - Would you be interested in adding a description of the HuRef callset you made to our new list of benchmarking calls at https://github.com/ga4gh/benchmarking-tools/tree/master/resources/high-confidence-sets?
High-confidence sets added to https://github.com/ga4gh/benchmarking-tools/tree/master/resources/high-confidence-sets
David Haussler suggested including links to datasets that can be used for benchmarking in this repo, which I think is a good idea. I suggest we might want 2 categories of data for each genome - high-confidence vcf/bed files and raw data files. Does that make sense?
Here are the genomes I'll propose as a start: