ghm17 / LOGODetect

LOGODetect is a powerful tool to identify small segments that harbor local genetic correlation between two traits/diseases.
GNU General Public License v3.0
19 stars 5 forks source link

What sample size should be used for meta-analyses? #9

Closed jdblischak closed 3 years ago

jdblischak commented 3 years ago

LOGODetect requires a single sample size for each GWAS. Could you please advise me on which sample size I should use for a meta-analysis where the sample size differs per SNP?

I looked at your Supplementary Table 12, and from the titles alone it appears that at least 2 of the studies were meta-analyses. Nagel et al. report a total sample size of 449,484, but your table says the sample size you used was 390,278. Savage et al. report a total sample size of 269,867, which matches the value in your table.

In other words, should I use the max sample size in the meta-analysis (as you did for the Savage et al. study) or is there a different procedure I should use to determine the sample size for use with LOGODetect? Thanks!

ghm17 commented 3 years ago

The sample size of 449,484 reported in Nagel et al. comprises 59,206 individuals from 23andMe. The released GWAS summary stats only include 390,278 individuals not in 23andMe.

Currently, LOGODetect assumes homogeneous sample size across all SNPs (this assumption could be relaxed, but certain modifications are required, and the software will be updated in the future). In practice, we recommend to use the max sample size in the meta-analysis, and filter out those SNPs whose sample size is relatively much smaller (e.g. 50% max sample size).

jdblischak commented 3 years ago

As always, thanks for the quick and informative response!

The sample size of 449,484 reported in Nagel et al. comprises 59,206 individuals from 23andMe. The released GWAS summary stats only include 390,278 individuals not in 23andMe.

Ok, makes sense.

Currently, LOGODetect assumes homogeneous sample size across all SNPs (this assumption could be relaxed, but certain modifications are required, and the software will be updated in the future).

That'd be great if LOGODetect could handle heterogeneous sample sizes across SNPs. I look forward to this new feature.

In practice, we recommend to use the max sample size in the meta-analysis, and filter out those SNPs whose sample size is relatively much smaller (e.g. 50% max sample size).

Sounds good. I appreciate the advice. I will use the max sample size for now and experiment with different filters for the minimum sample size for a SNP.

Also note that munge_sumstats.py by default applies a heuristic to remove SNPs with very low sample sizes, so these SNPs already won't affect the LDSC estimation of heritability or genetic correlation.

--n-min N_MIN Minimum N (sample size). Default is (90th percentile N) / 2.