Illumina / PlatinumGenomes

The Platinum Genomes Truthset
https://illumina.github.io/PlatinumGenomes
84 stars 9 forks source link

Question about Confident region #2

Closed AuroUTU closed 6 years ago

AuroUTU commented 6 years ago

Hi, I am a master student master student who using PG. I am confused with the confident region. Based on the paper and Github wiki, my understanding is that: inside the confident regions, they are non-variants (0|0) or homozygous variants (1|1), but I still found some heterozygous variants (0|1) from truth set that are located inside confident regions. So what is the feature of confident regions? Fully homozygous( only 0|0 and 1|1)? Or partly homozygous (0|0 and 1|1) with validated heterozygous variants (0|1)? Thank you

blmoore commented 6 years ago

Confident regions cover everything we are able to confidently characterise throughout the pedigree; that is, validated variants of any genotype (heterozygous or homozygous alternative) as well as blocks of homozygous reference (0/0). Is that clear?

AuroUTU commented 6 years ago

Yes, now I understand what it is. Thank you very much. By the way, when I look at the truth set VCF file split by chromosome, I found that the first variants looks like not start from the very beginning of chromosome, but the confident regions starts from "very beginning",is that means the "beginning part" of chromosome has no variants you can validate?

blmoore commented 6 years ago

Yeah, any position that is both: a) covered by confident regions and b) doesn't contain a variant record in your sample of interest, we're calling homozygous reference (and that's true everywhere including towards the ends of chromosomes).

AuroUTU commented 6 years ago

Thank you very much :)