23andMe / yhaplo

Identifying Y-chromosome haplogroups in arbitrarily large samples of sequenced or genotyped men
Other
102 stars 24 forks source link

Yhaplo analysis on UK Biobank data #40

Open jielab opened 3 days ago

jielab commented 3 days ago

Dear David:

Please refer to my post at PLINK users group: https://groups.google.com/g/plink2-users/c/Xvt895jb48w

It seems that there is no readl ChrY data from the UK Biobank genotype dataset, and therefore there is no way to run Yhaplo program on it, correct?

I run the following command on the UK Biobank ChrXY male data anyway: yhaplo -i chrXY-males.vcf.gz -o jie.

There is NO error message, and I got the output files as listed at this link https://github.com/jielab/001/tree/master/jie.

I was expecting to get a phylogentic tree dataset for all the male samples who have genotype data, but none of the output .awk file included such information.

So, please kindly advise how to run yhaplo on UK Biobank genetic data.

Thank you & best regards, Jie

teepean commented 3 days ago

How did you originally create this "pfile chrXY"?

jielab commented 3 days ago

I originally used gfetch to download the UKB raw genotype data and then used plink2 --bed --bim --fam --make-bed to generate the ChrXY data.

Best regards, Jie

teepean commented 3 days ago

Does the original .bim file have chrY or chromosome numbered 24? Both represent chromosome Y.

jielab commented 2 days ago

Please see the screenshot below.

The gentyped data only has 1357 variants and all are labelled XY. The imputed data has 45907 variants. All except 1 are labelled XY.

image

Best regards, Jie

teepean commented 2 days ago

I don't have access to UKBB so I am not sure how gfetch works. Could you show the command you downloaded the data?

jielab commented 16 hours ago

The UKBB genotype data is NOT downloadable anymore. gfetch is simply a UKBB provided tool to download big data, just like FTP.

Anyway, the full list of SNPs (including those for chromosome X, XY) were downloadable from this link https://biobank.ndph.ox.ac.uk/showcase/ukb/auxdata/ukb_snp_bim.tar, as written on this page https://biobank.ndph.ox.ac.uk/showcase/refer.cgi?id=1963.

For your convenience, I also posted chrY.bim and chrXY.bim at my Github https://github.com/jielab/001/tree/master/jie.

Can you please take a look at this SNP file and kindly let me know if Yhaplo could infer phylogeny from this data?

Thanks!

Jie

teepean commented 16 hours ago

The tar archive has ukb_snp_chrY_v2.bim that has only 691 Y snps. I am not sure that is enough.

jielab commented 15 hours ago

But they also have chrXY. Also, the imputed data has tens of thousands of SNPs for ChrX and ChrXY.

It would be great if you guys are interested in applying Yhaplo to UKBB data.

Best regards, Jie