chrchang / plink-ng

A comprehensive update to the PLINK association analysis toolset. Beta testing of the first new version (1.90), focused on speed and memory efficiency improvements, is finishing up. Development is now focused on building out support for multiallelic, phased, and dosage data in PLINK 2.0.
https://www.cog-genomics.org/plink/2.0/
417 stars 126 forks source link

read haploid dosages with pgenlib #231

Open 23andme-jaredo opened 1 year ago

23andme-jaredo commented 1 year ago

Is it possible to read haploid dosages with pgenlib.PgenReader?

thanks,

Jared

chrchang commented 1 year ago

As with the plink .bed format, haploid vs. diploid is not directly encoded in the .pgen. Instead, plink and plink2 divide the encoded values by two when the .bim/.pvar (and on chrX, .fam/.psam) file indicates that we're dealing with haploid data.

23andme-jaredo commented 1 year ago

hmmm so I am a bit confused. I have imputed data converted from bcf via:

plink2 --bcf $bcf dosage=HDS --make-pfile

and I can see that the two haploid dosages per individual are stored because I can recover them via:

 plink2 --pfile plink2 --export vcf bgz vcf-dosage=HDS

so I am try to extract those HDS values via pgenlib

23andme-jaredo commented 1 year ago

Maybe I wasn't clear that I meant imputed haploid/phased probabilities, not hard genotypes.

chrchang commented 1 year ago

Oh, sorry, I thought you were referring to e.g. chrX/chrY/chrM.

The PgrGetDp() function in pgenlib_read.h is the simplest one that can return biallelic phased dosages.

23andme-jaredo commented 1 year ago

Thanks! We'll try exposing that in python.