chrchang / plink-ng

A comprehensive update to the PLINK association analysis toolset. Beta testing of the first new version (1.90), focused on speed and memory efficiency improvements, is finishing up. Development is now focused on building out support for multiallelic, phased, and dosage data in PLINK 2.0.
https://www.cog-genomics.org/plink/2.0/
415 stars 126 forks source link

--score with mean imputation calculates allele frequency for unneeded alleles #237

Closed dvg-p4 closed 1 year ago

dvg-p4 commented 1 year ago

I'm currently trying to use plink2 to calculate PGSs, but noticing that it takes incredibly long to calculate allele frequencies unless I set the no-mean-imputation flag. Since the number of variants I have in my score file is very small, and my input pgen is very large, I presume that it must be calculating allele frequencies for every variant in my pgen file. (A glance at the source code seems to suggest this is the case, though I'm not 100% sure.)

chrchang commented 1 year ago

See https://www.cog-genomics.org/plink/2.0/input#read_freq .

dvg-p4 commented 1 year ago

So this needs to be a two-step process?

chrchang commented 1 year ago

If you're calculating many different scores on the same dataset, the two-step process makes sense.

If you aren't, an alternative hack that'll probably work is pointing --extract at the --score file.