The effective_n function within this package is used to compute the effective sample size. This function calculates the harmonic mean between the case count and the control count. To me, this seems to be off by a factor of 2.
Currently, in the balanced case (same number of cases and controls), the effective_n function returns just the case count (or just the control count, since they are equal). E.g.:
But in the balanced case there are 181,522 + 181,522 = 363,044 samples, i.e., 2x the harmonic mean.
Describe the feature you would like to see (required)
I think it would be good to provide additional documentation explaining the choice for the current effective_n formula, or possibly to change its behavior depending on the rationale.
The
effective_n
function within this package is used to compute the effective sample size. This function calculates the harmonic mean between the case count and the control count. To me, this seems to be off by a factor of 2.While https://www.nature.com/articles/nprot.2014.071 is cited as the rationale for the present formula, my impression is that the better formula comes from, e.g., the METAL paper, which defines Neff as 2x this value. Other material such as that from the University of Helsinki or various conversations from the MTAG github Issues also define Neff with the same formula as METAL.
Currently, in the balanced case (same number of cases and controls), the
effective_n
function returns just the case count (or just the control count, since they are equal). E.g.:And this is the same as the harmonic mean:
But in the balanced case there are
181,522 + 181,522 = 363,044
samples, i.e., 2x the harmonic mean.Describe the feature you would like to see (required)
I think it would be good to provide additional documentation explaining the choice for the current
effective_n
formula, or possibly to change its behavior depending on the rationale.