broadinstitute / gatk-protected

Obsolete/Legacy GATK repository -- go to https://github.com/broadinstitute/gatk instead
BSD 3-Clause "New" or "Revised" License
33 stars 20 forks source link

Blacklist PAR regions in TargetCoverageSexGenotyper #653

Closed mbabadi closed 7 years ago

mbabadi commented 7 years ago

Issue: cross reads from X to Y in PAR regions lead to problematic sex inferences. In a cohort of 112 sex annotated samples, a single XX -> XY misclassification was noticed, which was identified to be due to large number of reads aligned to the Ychr PAR region.

Solution: the tool must take an interval list for "blacklisted regions" and neglect targets lying in that region while calculating likelihoods of different sex genotypes.

ldgauthier commented 7 years ago

Unfortunately Shaila, who's been visiting the MacArthur lab for a few months, isn't around much longer. She did a lot of chrX work in grad school and we talked a lot about variant filtering strategies for X. There is an interval list for the PAR in UCSC or somewhere similar, but she said that according to population genetics theory, it's not static so it will be slightly different in different individuals and will change with time (probably dozens of generations at least.) I think for your purposes the published intervals will probably be fine, but I wanted to note that the PAR isn't perfectly defined.

mbabadi commented 7 years ago

Thanks @ldgauthier, interesting huh... I'm curious to understand the microscopic mechanism responsible for the drift of PARs in the offsprings. Anyway, like you said, we'll be fine working with published intervals (we can also pad them a bit for good measure). Also, the PAR "blacklist" table is going to be a flexible user input.

ldgauthier commented 7 years ago

And Konrad pointed out there's a PAR3 in some people (about 2% of the population): http://link.springer.com/article/10.1007%2Fs10142-013-0323-6 Chromosome X is like the wild west. But Monkol is super excited about CNV calls on X!

mbabadi commented 7 years ago

This is addressed in https://github.com/broadinstitute/gatk-protected/pull/695 Closing.