Illumina / PlatinumGenomes

The Platinum Genomes Truthset
https://illumina.github.io/PlatinumGenomes
84 stars 9 forks source link

homozygous reference positions for FP assesment #10

Open 5mec opened 4 years ago

5mec commented 4 years ago

Hi there,

I'm looking for the data described in your Genomes res. 2017 paper as: “… we identified 2,737,246,156 positions that are homozygous reference across the pedigree. These positions can be used to calculate false positive rates when assessing variant calling pipelines.”

Could you please direct me to the correct file for hg38?

From the description of the Confident Regions at https://github.com/Illumina/PlatinumGenomes/wiki/Confident-regions I can't tell if this is homozygous reference data as the first and second paragraph on this page are confusing when read together.

Thanks for your help

helen

blmoore commented 4 years ago

Confident regions contain both homozygous reference regions and the validated variant sites (i.e. records in the truthset VCFs) so if you were to subtract the NA12878 + NA12877 truthset records from the confident region bed files you'll be left with just the hom-ref regions.

hg38 bed files are here: