Open dtaliun opened 1 year ago
Yes, this is in the works! Our current timeline has a release coming before the end of December.
Hi,
I wanted to follow up and ask if you have any updates on chromosome X. Related question: I see that there is phased chromosome X data inside gs://gcp-public-data--gnomad/resources/hgdp_1kg/phased_haplotypes/
; what is the difference between phased_haplotypes and phased_haplotypes_v2?
Thanks again for the resource!
Hi @dtaliun , we are wrapping up phasing of chromosome X. I'll ping you here once it's available for download.
Regarding the difference, we had phased the first release without pedigree information and this affected phasing/imputation performance when compared to NYGC 1KG. Pedigree was incorporated in v2, which achieves better phasing/imputation compared to NYGC 1KG panel as we show in the manuscript. We also filtered out singletons in v2, you'll notice a drop in the number of variants compared to the first release.
Hi @dtaliun , the phased chromosome X files are now available for download and can be found here:
gs://gcp-public-data--gnomad/resources/hgdp_1kg/phased_haplotypes_v2/hgdp1kgp_chrX*
Hi @LindoNkambule,
Thank you very much for sharing the phased chromosome X! I appreciate all your hard work.
I started to use the data, but I found a few thousand variants on the chrX non-PAR region where male samples were phased as heterozygous. Here are a few examples of heterozygous genotypes in male samples in the phased output: HG00881 is 1|0 for chrX:2855315:C/T HG02011,NA20866, and NA20903 are 1|0 for chrX:3443944:A/G
I believe this may be related to the bug in shapeit5, where shapeit5 doesn't set these het genotypes to missing correctly.
Could you please investigate on your end?
Thanks again!
Hi @dtaliun,
Thank you for raising this issue. I will look into it.
Hi @LindoNkambule,
I just wanted to follow up on the chromosome X phased files. Something is not right about them.
Unlike autosomal chromosome files, they also include monomorphic variants and singletons. Moreover, the hgdp1kgp_chrX_par1.shapeit5_common.bcf and hgdp1kgp_chrX_par2.shapeit5_common.bcf files have no variants with AC>=2:
bcftools view -H -c2 -C8180 hgdp1kgp_chrX_par1.shapeit5_common.bcf | wc -l
# outputs 1 entry
bcftools view -H -c2 -C8180 hgdp1kgp_chrX_par2.shapeit5_common.bcf | wc -l
# outputs 1 entry
Sorry to jump in, but I too am looking for the X data. Like most folks, I'm interested in the non-par regions of the X, but what is posted are just rare variants, whereas for the autosomes you get rare+common variants. I'm hoping that maybe the final X dataset just isn't posted? Or perhaps shapeit5 doesn't play nice w/ the X. For my purposes, I'm just interested in making a panel for phasing/imputation, in which case maybe there are enough males where I can get enough "phased" haplotypes from the original VCF to make it work.
Hi everyone, the issues pointed have been addressed, I will post an update once the files have been made public.
One caveat, cc @dtaliun : to address the issue pointed out by @dtaliun , we decided to code the males as homozygous in non-PAR region. However, there were still a few variants where males were being phased as heterozygous (see issue). Since this was a small number (~7K), we decided to filter them out in the meantime while we wait for the SHAPEIT5 developers to look into the issue.
Hi @LindoNkambule,
Thank you very much for fixing it! And thank you again for sharing all this valuable data with us!
Thank you very much for creating amazing resources!
Quick question: do you plan to release phased chromosome X any time soon?
Thanks again!