genepi / imputationserver

Michigan Imputation Server: A new web-based service for imputation that facilitates access to new reference panels and greatly improves user experience and productivity
https://imputationserver.sph.umich.edu/
GNU Affero General Public License v3.0
77 stars 41 forks source link

Imputation quality score (rsq_hat) #76

Open glettre opened 2 years ago

glettre commented 2 years ago

Hi,

Thanks for setting up the TOPMed imputation server.

Because of the limitation in sample size (n=25K), we are imputing our large cohort in 2 batches but intend to merge the datasets post-imputation. We would like to calculate an overall imputation quality score (rsq_hat) and we have been advised to use the following code:

https://github.com/statgen/Minimac4/blob/5e6f3cc91d166cd2298c296e46c9f428e6e0f3aa/src/ImputationStatistics.cpp#L50-L63 https://github.com/statgen/r2-estimator/blob/7e2162a0e9db6e2b56d0e036cfbc43b58a977ef6/src/main.cpp#L211-L223

However, when we compared the imputation rsq provided by the TOPMed imputation server vs. the rsq that we calculated (on the same dataset using the code above), we noted an inflation (improvement) of the quality metric, but only for the rare variants . To us, it suggests that the rsq calculation method implemented in the TOPMed imputation server does something a bit different with the rare variants than what is found in the codes that you shared. Is that possible?

We would be curious to have your thoughts.

Best, Guillaume