MRCIEU / TwoSampleMR

R package for performing 2-sample MR using MR-Base database
https://mrcieu.github.io/TwoSampleMR
Other
443 stars 177 forks source link

effective_n: off by factor of 2? #433

Open carbocation opened 1 year ago

carbocation commented 1 year ago

The effective_n function within this package is used to compute the effective sample size. This function calculates the harmonic mean between the case count and the control count. To me, this seems to be off by a factor of 2.

While https://www.nature.com/articles/nprot.2014.071 is cited as the rationale for the present formula, my impression is that the better formula comes from, e.g., the METAL paper, which defines Neff as 2x this value. Other material such as that from the University of Helsinki or various conversations from the MTAG github Issues also define Neff with the same formula as METAL.

Currently, in the balanced case (same number of cases and controls), the effective_n function returns just the case count (or just the control count, since they are equal). E.g.:

TwoSampleMR::effective_n(ncase=181522,ncontrol=181522)
[1] 181522

And this is the same as the harmonic mean:

> psych::harmonic.mean(c(181522, 181522))
[1] 181522

But in the balanced case there are 181,522 + 181,522 = 363,044 samples, i.e., 2x the harmonic mean.

Describe the feature you would like to see (required)

I think it would be good to provide additional documentation explaining the choice for the current effective_n formula, or possibly to change its behavior depending on the rationale.

carbocation commented 1 year ago

One more place where 2x the harmonic mean is advised is from the ldsc_users discussion boards