bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
601 stars 331 forks source link

Sample size with repeated observations #415

Open ldehoyos opened 4 months ago

ldehoyos commented 4 months ago

I want to compute LDSC heritability and genetic correlation of a genome-wide meta-analysis including repeated observations and I was wondering what value for sample size I should use.

For example, for some studies, the same individuals were included at age 20 and age 25. Here an example for one SNP to illustrate:

Study Phenotype Age N Study_1 Phenotype 20 1000 Study_1 Phenotype 25 900 Study_2 Phenotype 24 2000 Study_3 Phenotype 24 3000 Study_4 Phenotype 25 2000 Study_5 Phenotype 20 2500 Study_5 Phenotype 23 2500

What sample size should I use?

Note that in this example the difference in N is not so big (3000), but in my real dataset the differences are substantial e.g. Nind= 60,000 vs Nobs=400,000.

Thanks in advance, L

aksarkar commented 3 weeks ago

@ldehoyos I think the only correct way to perform the analysis is to perform GWAS of phenotype at each age, avoiding "repeated observations". Then, you should compute heritability/genetic correlation of each phenotype/age combination.