atgu / hgdp_tgp

MIT License
34 stars 5 forks source link

Chromosome X phasing #9

Open dtaliun opened 10 months ago

dtaliun commented 10 months ago

Thank you very much for creating amazing resources!

Quick question: do you plan to release phased chromosome X any time soon?

Thanks again!

z-koenig commented 10 months ago

Yes, this is in the works! Our current timeline has a release coming before the end of December.

dtaliun commented 5 months ago

Hi, I wanted to follow up and ask if you have any updates on chromosome X. Related question: I see that there is phased chromosome X data inside gs://gcp-public-data--gnomad/resources/hgdp_1kg/phased_haplotypes/; what is the difference between phased_haplotypes and phased_haplotypes_v2? Thanks again for the resource!

LindoNkambule commented 5 months ago

Hi @dtaliun , we are wrapping up phasing of chromosome X. I'll ping you here once it's available for download.

Regarding the difference, we had phased the first release without pedigree information and this affected phasing/imputation performance when compared to NYGC 1KG. Pedigree was incorporated in v2, which achieves better phasing/imputation compared to NYGC 1KG panel as we show in the manuscript. We also filtered out singletons in v2, you'll notice a drop in the number of variants compared to the first release.

LindoNkambule commented 4 months ago

Hi @dtaliun , the phased chromosome X files are now available for download and can be found here: gs://gcp-public-data--gnomad/resources/hgdp_1kg/phased_haplotypes_v2/hgdp1kgp_chrX*

dtaliun commented 4 months ago

Hi @LindoNkambule,

Thank you very much for sharing the phased chromosome X! I appreciate all your hard work.

I started to use the data, but I found a few thousand variants on the chrX non-PAR region where male samples were phased as heterozygous. Here are a few examples of heterozygous genotypes in male samples in the phased output: HG00881 is 1|0 for chrX:2855315:C/T HG02011,NA20866, and NA20903 are 1|0 for chrX:3443944:A/G

I believe this may be related to the bug in shapeit5, where shapeit5 doesn't set these het genotypes to missing correctly.

Could you please investigate on your end?

Thanks again!

LindoNkambule commented 3 months ago

Hi @dtaliun,

Thank you for raising this issue. I will look into it.

dtaliun commented 2 months ago

Hi @LindoNkambule,

I just wanted to follow up on the chromosome X phased files. Something is not right about them.

Unlike autosomal chromosome files, they also include monomorphic variants and singletons. Moreover, the hgdp1kgp_chrX_par1.shapeit5_common.bcf and hgdp1kgp_chrX_par2.shapeit5_common.bcf files have no variants with AC>=2:

bcftools view -H -c2 -C8180 hgdp1kgp_chrX_par1.shapeit5_common.bcf | wc -l
# outputs 1 entry
bcftools view -H -c2 -C8180 hgdp1kgp_chrX_par2.shapeit5_common.bcf | wc -l
# outputs 1 entry
Ahhgust commented 2 weeks ago

Sorry to jump in, but I too am looking for the X data. Like most folks, I'm interested in the non-par regions of the X, but what is posted are just rare variants, whereas for the autosomes you get rare+common variants. I'm hoping that maybe the final X dataset just isn't posted? Or perhaps shapeit5 doesn't play nice w/ the X. For my purposes, I'm just interested in making a panel for phasing/imputation, in which case maybe there are enough males where I can get enough "phased" haplotypes from the original VCF to make it work.

LindoNkambule commented 1 week ago

Hi everyone, the issues pointed have been addressed, I will post an update once the files have been made public.

One caveat, cc @dtaliun : to address the issue pointed out by @dtaliun , we decided to code the males as homozygous in non-PAR region. However, there were still a few variants where males were being phased as heterozygous (see issue). Since this was a small number (~7K), we decided to filter them out in the meantime while we wait for the SHAPEIT5 developers to look into the issue.

dtaliun commented 1 week ago

Hi @LindoNkambule,

Thank you very much for fixing it! And thank you again for sharing all this valuable data with us!