Imputation quality score (rsq_hat)

Hi,

Thanks for setting up the TOPMed imputation server.

Because of the limitation in sample size (n=25K), we are imputing our large cohort in 2 batches but intend to merge the datasets post-imputation. We would like to calculate an overall imputation quality score (rsq_hat) and we have been advised to use the following code:

https://github.com/statgen/Minimac4/blob/5e6f3cc91d166cd2298c296e46c9f428e6e0f3aa/src/ImputationStatistics.cpp#L50-L63 https://github.com/statgen/r2-estimator/blob/7e2162a0e9db6e2b56d0e036cfbc43b58a977ef6/src/main.cpp#L211-L223

However, when we compared the imputation rsq provided by the TOPMed imputation server vs. the rsq that we calculated (on the same dataset using the code above), we noted an inflation (improvement) of the quality metric, but only for the rare variants . To us, it suggests that the rsq calculation method implemented in the TOPMed imputation server does something a bit different with the rare variants than what is found in the codes that you shared. Is that possible?

We would be curious to have your thoughts.

Best, Guillaume

genepi / imputationserver

Imputation quality score (rsq_hat) #76