OpenMendel / SnpArrays.jl

Compressed storage for SNP data
https://openmendel.github.io/SnpArrays.jl/latest
Other
44 stars 9 forks source link

GRM factor #117

Closed olivierlabayle closed 2 years ago

olivierlabayle commented 2 years ago

Hi and thanks for this very useful Julia package,

I think I understand from your implementation and the manuscript that the GRM method has a factor 4 in the denominator. In most other papers like this one or this one, I see the factor is only 2. Is there any reason why this is different?

Thanks, Olivier

Hua-Zhou commented 2 years ago

With a factor of 4, the GRM is an unbiased estimate of the theoretical kinship matrix Φ, whose diagonal entries are 0.5 (the kinship coefficient of an individual with itself is 0.5). If using the factor of 2, then GRM is an unbiased estimate of 2Φ (a correlation matrix). So it's more or less a personal choice of the authors, thinking GRM as empirical kinship or a correlation matrix.

For the background on the theoretical kinship, Chapter 5 of the book Mathetical and Statistical Methods for Genetic Analysis is a good reference.

olivierlabayle commented 2 years ago

Thank you very much for that explanation, I didn't know there was a notion of kinship. May I also ask: you say that the GRM is an unbiased estimator for the correlation matrix (at least by multiplying by 2), however I have noticed that doing so would yield values greater than 1 which doesn't seem to be consistent with the notion of correlation. I am a bit puzzled about this, and to be fair I have exactly the same phenomenom with other softwares.

Hua-Zhou commented 2 years ago

GRM is only an estimate of kinship (or twice the kinship) and is not guaranteed to satisfy the constraint. Actually diagonal entries significantly different from the theoretical values (1/2 or 1) may indicate violations of certain assumptions by the particular data, e.g., presence of inbreeding.