bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
626 stars 339 forks source link

Large Ratio values in univariate heritability estimations #31

Closed r03ert0 closed 9 years ago

r03ert0 commented 9 years ago

Hello, For a series of phenotypes I'm getting Ratio values of 17% and even 30%. The GWAS data is not genomic-controlled. However, when I tried a GC version of the same analyses, the results were not much different (I am not doing the GWAS analyses myself), and in general smaller that what I had obtained using GCTA. The number of subjects is N~13,000, European ancestry. What can explain the large Ratio values and the absence of difference between GC and non-GC heritability estimates? Thank you!

bulik commented 9 years ago

Tough to say without more information -- here are a few troubleshooting steps

  1. To rule out problems further upstream, when you run the commands in the tutorial, do you get results that match the tutorial?
  2. Could you send me the log file (or copy/paste it here?)
  3. What traits are these?
  4. Can you rule out the possibility that the GWAS data are confounded by pop strat (i.e., do you exclude related individuals + ancestry outliers + run regressions w/ PC covariates or a mixed model?)
r03ert0 commented 9 years ago

Hi Brendan,

here's more info:

  1. I get smaller but comparable h2 values than in the the tutorial, even though I'm reproducing exactly the same steps: h2(scz)=0.42 instead of 0.59, and h2(bip)=0.32 instead of 0.52, rG=0.73 instead of 0.62. I'm appending the complete log at the end
  2. Ok. I'm also appending the log for one of my phenotypes (here I'm using the ldscores for functional partition)
  3. The phenotypes are volumes of different brain regions, from the ENIGMA consortium. They all have h2 ~80% in twin studies, and from h2=45-55% using GCTA (s.e.=20%...)
  4. Yes, I would rule out population structure. It was controlled using 4 PCs, and only eur were included (mach2qtl was used at each site, and results were combined using metal).

Follow the Log file for the the tutorial, and the Log for ICV (heritability

thank you ! roberto


Log file for the tutorial

golgi:2015_02LDSC roberto$ $ldsc --rg /Users/roberto/Desktop/scz.sumstats.gz,/Users/roberto/Desktop/bip.sumstats.gz --ref-ld-chr /Applications/_Geno/LDSCORES_EUR/eur_ref_ld_chr/ --w-ld-chr /Applications/_Geno/LDSCORES_EUR/eur_w_ld_chr/ --out ~/Desktop/scz_bip


Options: --ref-ld-chr /Applications/_Geno/LDSCORES_EUR/eur_ref_ld_chr/ --out /Users/roberto/Desktop/scz_bip --rg /Users/roberto/Desktop/scz.sumstats.gz,/Users/roberto/Desktop/bip.sumstats.gz --w-ld-chr /Applications/_Geno/LDSCORES_EUR/eur_w_ld_chr/

Beginning analysis at Mon Mar 23 19:11:55 2015 Reading summary statistics from /Users/roberto/Desktop/scz.sumstats.gz ... Read summary statistics for 844500 SNPs. Reading reference panel LD Score from /Applications/_Geno/LDSCORES_EUR/eur_ref_ld_chr/[1-22] ... Read reference panel LD Scores for 1317571 SNPs. Reading regression weight LD Score from /Applications/_Geno/LDSCORES_EUR/eur_w_ld_chr/[1-22] ... Read regression weight LD Scores for 1293150 SNPs. After merging with reference panel LD, 844274 SNPs remain. After merging with regression SNP LD, 840450 SNPs remain. Computing rg for phenotype 2/2 Reading summary statistics from /Users/roberto/Desktop/bip.sumstats.gz ... Read summary statistics for 1217311 SNPs. After merging with summary statistics, 840450 SNPs remain. 803373 SNPs with valid alleles.

Heritability of phenotype 1

Total Observed scale h2: 0.4209 (0.0487) Lambda GC: 1.2038 Mean Chi^2: 1.2336 Intercept: 1.0407 (0.0124) Ratio: 0.1742 (0.0531)

Heritability of phenotype 2/2

Total Observed scale h2: 0.3286 (0.0418) Lambda GC: 1.1364 Mean Chi^2: 1.1436 Intercept: 1.0334 (0.0088) Ratio: 0.2323 (0.0614)

Genetic Covariance

Total Observed scale gencov: 0.2719 (0.0326) Mean z1*z2: 0.1226 Intercept: 0.0212 (0.007)

Genetic Correlation

Genetic Correlation: 0.7312 (0.0775) Z-score: 9.4353 P: 3.8976e-21

Summary of Genetic Correlation Results

p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se /Users/roberto/Desktop/scz.sumstats.gz /Users/roberto/Desktop/bip.sumstats.gz 0.731 0.077 9.435 3.898e-21 0.329 0.042 1.033 0.009 0.021 0.007

Analysis finished at Mon Mar 23 19:12:12 2015 Total time elapsed: 17.24s


Log for ICV (heritability + annotation)


Options: --h2 /Users/roberto/Documents/_04_Data/2015_02ENIGMA-GWAS/e2_noGC/ICV.sumstats.gz --ref-ld-chr /Applications/_Geno/LDSCORE/baseline. --out /Users/roberto/Documents/_04_Data/2015_02ENIGMA-GWAS/ICV.annot --overlap-annot --frqfile-chr /Applications/_Geno/LDSCORE/1000G.mac5eur. --w-ld-chr /Applications/_Geno/LDSCORE/weights.

Beginning analysis at Mon Mar 23 19:33:02 2015 Reading summary statistics from /Users/roberto/Documents/_04_Data/2015_02ENIGMA-GWAS/e2_noGC/ICV.sumstats.gz ... Read summary statistics for 1068341 SNPs. Reading reference panel LD Score from /Applications/_Geno/LDSCORE/baseline.[1-22] ... Read reference panel LD Scores for 1189907 SNPs. Reading regression weight LD Score from /Applications/_Geno/LDSCORE/weights.[1-22] ... Read regression weight LD Scores for 1242190 SNPs. After merging with reference panel LD, 1066566 SNPs remain. After merging with regression SNP LD, 1063925 SNPs remain. Removed 0 SNPs with chi^2 > 80 (1063925 SNPs remain) Total Observed scale h2: 0.1743 (0.0527) Categories: base_0 Coding_UCSC_0 Coding_UCSC.extend.500_0 Conserved_LindbladToh_0 Conserved_LindbladToh.extend.500_0 CTCF_Hoffman_0 CTCF_Hoffman.extend.500_0 DGF_ENCODE_0 DGF_ENCODE.extend.500_0 DHS_peaks_Trynka_0 DHS_Trynka_0 DHS_Trynka.extend.500_0 Enhancer_Andersson_0 Enhancer_Andersson.extend.500_0 Enhancer_Hoffman_0 Enhancer_Hoffman.extend.500_0 FetalDHS_Trynka_0 FetalDHS_Trynka.extend.500_0 H3K27ac_Hnisz_0 H3K27ac_Hnisz.extend.500_0 H3K27ac_PGC2_0 H3K27ac_PGC2.extend.500_0 H3K4me1_peaks_Trynka_0 H3K4me1_Trynka_0 H3K4me1_Trynka.extend.500_0 H3K4me3_peaks_Trynka_0 H3K4me3_Trynka_0 H3K4me3_Trynka.extend.500_0 H3K9ac_peaks_Trynka_0 H3K9ac_Trynka_0 H3K9ac_Trynka.extend.500_0 Intron_UCSC_0 Intron_UCSC.extend.500_0 PromoterFlanking_Hoffman_0 PromoterFlanking_Hoffman.extend.500_0 Promoter_UCSC_0 Promoter_UCSC.extend.500_0 Repressed_Hoffman_0 Repressed_Hoffman.extend.500_0 SuperEnhancer_Hnisz_0 SuperEnhancer_Hnisz.extend.500_0 TFBS_ENCODE_0 TFBS_ENCODE.extend.500_0 Transcribed_Hoffman_0 Transcribed_Hoffman.extend.500_0 TSS_Hoffman_0 TSS_Hoffman.extend.500_0 UTR_3_UCSC_0 UTR_3_UCSC.extend.500_0 UTR_5_UCSC_0 UTR_5_UCSC.extend.500_0 WeakEnhancer_Hoffman_0 WeakEnhancer_Hoffman.extend.500_0 Observed scale h2: -0.1019 -0.0477 0.1277 0.1067 -0.0816 -0.1363 0.2713 -0.0062 0.3536 0.0616 -0.1858 0.1957 -0.0266 -0.0148 -0.0211 -0.2937 -0.173 -0.1533 0.1698 -0.2379 0.2707 -0.3084 0.2828 0.0225 -0.08 -0.1075 0.2689 -0.2008 0.0978 -0.3781 0.4716 0.7258 -0.7615 -0.0388 0.0513 0.0727 -0.0733 0.0459 0.0719 0.386 -0.3124 0.5836 -0.5835 0.126 -0.2301 0.0385 -0.0016 0.0104 -0.0499 0.0109 -0.0767 -0.0318 0.0649 Observed scale h2 SE: 0.2861 0.043 0.0669 0.0492 0.1028 0.0708 0.1019 0.1824 0.2843 0.2147 0.2671 0.3462 0.0304 0.0424 0.124 0.1894 0.1599 0.2311 0.625 0.6795 0.3655 0.3928 0.1959 0.372 0.3667 0.0833 0.1554 0.1771 0.0838 0.174 0.1983 1.279 1.307 0.0358 0.055 0.1469 0.154 0.2198 0.3235 0.8178 0.8312 0.1742 0.2457 0.1571 0.2766 0.0658 0.0759 0.0403 0.0396 0.0216 0.0377 0.0559 0.0981 Proportion of SNPs: 0.0886 0.0013 0.0057 0.0023 0.0295 0.0021 0.0063 0.0122 0.048 0.0099 0.0149 0.0442 0.0004 0.0017 0.0056 0.0136 0.0075 0.0252 0.0347 0.0374 0.0239 0.0298 0.0152 0.0378 0.054 0.0037 0.0118 0.0226 0.0034 0.0112 0.0204 0.0343 0.0352 0.0007 0.003 0.0028 0.0034 0.0409 0.0637 0.0149 0.0152 0.0117 0.0304 0.0306 0.0676 0.0016 0.0031 0.001 0.0024 0.0005 0.0025 0.0019 0.0079 Proportion of h2g: -0.5843 -0.2737 0.7326 0.6122 -0.4683 -0.7816 1.5562 -0.0354 2.0285 0.3533 -1.0659 1.1224 -0.1528 -0.0849 -0.1212 -1.6849 -0.9925 -0.8794 0.9743 -1.3647 1.5526 -1.7689 1.6222 0.1292 -0.4591 -0.6166 1.5424 -1.1517 0.5608 -2.1687 2.7054 4.1634 -4.3681 -0.2223 0.2945 0.4171 -0.4207 0.2631 0.4123 2.2143 -1.792 3.3475 -3.3473 0.7229 -1.3199 0.2209 -0.009 0.0595 -0.2863 0.0625 -0.4398 -0.1824 0.3721 Enrichment: -6.5956 -210.7377 128.0999 265.1679 -15.8969 -370.2442 247.2068 -2.9016 42.285 35.6804 -71.7216 25.4009 -397.769 -50.28 -21.605 -123.557 -132.1794 -34.8281 28.115 -36.4527 65.0384 -59.4223 106.8863 3.4191 -8.5072 -166.5447 130.6022 -50.8849 163.2844 -194.1202 132.441 121.2958 -124.16 -297.8092 99.3275 151.0758 -122.9295 6.4398 6.472 148.4129 -117.8779 285.2906 -110.016 23.6224 -19.5257 136.8614 -2.9197 60.7634 -120.0043 130.0739 -178.5434 -97.6117 47.218 Coefficients: -1.7187e-08 -5.4916e-07 3.3381e-07 6.9100e-07 -4.1425e-08 -9.6481e-07 6.4419e-07 -7.5612e-09 1.1019e-07 9.2979e-08 -1.8690e-07 6.6192e-08 -1.0365e-06 -1.3102e-07 -5.6300e-08 -3.2197e-07 -3.4444e-07 -9.0758e-08 7.3264e-08 -9.4991e-08 1.6948e-07 -1.5485e-07 2.7853e-07 8.9097e-09 -2.2169e-08 -4.3400e-07 3.4033e-07 -1.3260e-07 4.2550e-07 -5.0585e-07 3.4513e-07 3.1608e-07 -3.2355e-07 -7.7605e-07 2.5884e-07 3.9369e-07 -3.2034e-07 1.6781e-08 1.6865e-08 3.8675e-07 -3.0718e-07 7.4343e-07 -2.8669e-07 6.1557e-08 -5.0882e-08 3.5664e-07 -7.6084e-09 1.5834e-07 -3.1272e-07 3.3896e-07 -4.6526e-07 -2.5436e-07 1.2304e-07 Coefficient SE: 4.8268e-08 4.9468e-07 1.7489e-07 3.1840e-07 5.2170e-08 5.0154e-07 2.4193e-07 2.2372e-07 8.8580e-08 3.2415e-07 2.6869e-07 1.1712e-07 1.1838e-06 3.7496e-07 3.3034e-07 2.0758e-07 3.1836e-07 1.3681e-07 2.6961e-07 2.7131e-07 2.2888e-07 1.9725e-07 1.9291e-07 1.4714e-07 1.0157e-07 3.3631e-07 1.9666e-07 1.1699e-07 3.6457e-07 2.3277e-07 1.4513e-07 5.5699e-07 5.5531e-07 7.1628e-07 2.7743e-07 7.9559e-07 6.7264e-07 8.0424e-08 7.5921e-08 8.1931e-07 8.1724e-07 2.2196e-07 1.2069e-07 7.6765e-08 6.1171e-08 6.0979e-07 3.6773e-07 6.1534e-07 2.4824e-07 6.7338e-07 2.2868e-07 4.4752e-07 1.8598e-07 Lambda GC: 1.0527 Mean Chi^2: 1.0576 Intercept: 1.0175 (0.008) Ratio: 0.3036 (0.138) Reading annot matrix from /Applications/_Geno/LDSCORE/baseline.[1-22] ... Results printed to /Users/roberto/Documents/_04_Data/2015_02ENIGMA-GWAS/ICV.annot.results Analysis finished at Mon Mar 23 19:36:16 2015 Total time elapsed: 3.0m:13.88s Phenotype N_SNPs N_mssng INFO MAF out-of-bounds strand-ambiguous duplicated SNPs with N SIGNED_SUMSTATS summary-statistics Mean-Chi^2 Chi^2-too-small Lambda_GC Max-Chi^2 Genome-wide-significant ICV 1171392 0 0 3342 0 0 0 9692 0.00087732 1068341 1.057 Chi^2 OK 1.053 35.854 24 h2 (SE) Intercept (SE) Ratio (SE) ICV 0.1743 (0.0527) 1.0175 (0.008) 0.3036 (0.138)

bulik commented 9 years ago

The reason why you get slightly different results than the tutorial is because you used --ref-ld- chr eur_ref_ld_chr/ instead of --ref-ld-chr eur_w_ld_chr/. You can download the eur_w_ld_chr/ files from the same place. They tend to work a little better.

For ICV, it looks like you're getting a ratio statistic of ~0.3 with SE ~0.13, so those results are consistent with true ratio = 0 plus noise. As a sanity check, you could try the following

  1. --ref-ld-chr eur_w_ld_chr/ (ratio might be a bit lower?)
  2. --ref-ld-chr eur_w_ld_chr/ --no-intercept (constrain the intercept to 1; typically decreases the SE by ~30%. Since you're constraining the intercept to a lower value, h2 will go up, and it should be more similar to GCTA)

FYI, you may not have great power to detect enrichment with partitioned LD Score regression with a mean chi^2 of 1.05. We optimized the baseline model for unbiased estimation at big N (height, schizophrenia, BMI) rather than low variance estimation at low or medium N.

r03ert0 commented 9 years ago

Using eur_w_ld_chr I got the same values as in the tutorial. For ICV, I got

Total Observed scale h2: 0.1782 (0.0454) Lambda GC: 1.0527 Mean Chi^2: 1.0572 Intercept: 1.0169 (0.0068) Ratio: 0.2961 (0.1195)

and constraining the intercept:

Total Observed scale h2: 0.2579 (0.0312) Lambda GC: 1.0527 Mean Chi^2: 1.0572 Intercept: constrained to 1

With GCTA h2 was 0.54, with s.e.=0.23, because we only had N=1765. The sample for GCTA was very homogeneous, and then ICV (intra-cranial volumes estimated from MRI) values more comparable among individuals. In ENIGMA the different samples vary a lot, and volume measurements may include a lot of different confounds. Maybe what remains is not polygenic enough for the analyses...

On Mon, Mar 23, 2015 at 8:42 PM, Brendan Bulik-Sullivan < notifications@github.com> wrote:

The reason why you get slightly different results than the tutorial is because you used --ref-ld- chr eur_ref_ld_chr/ instead of --ref-ld-chr eur_w_ld_chr/. You can download the eur_w_ld_chr/ files from the same place. They tend to work a little better.

For ICV, it looks like you're getting a ratio statistic of ~0.3 with SE ~0.13, so those results are consistent with true ratio = 0 plus noise. As a sanity check, you could try the following

  1. --ref-ld-chr eur_w_ld_chr/ (ratio might be a bit lower?)
  2. --ref-ld-chr eur_w_ld_chr/ --no-intercept (constrain the intercept to 1; typically decreases the SE by ~30%. Since you're constraining the intercept to a lower value, h2 will go up, and it should be more similar to GCTA)

FYI, you may not have great power to detect enrichment with partitioned LD Score regression with a mean chi^2 of 1.05. We optimized the baseline model for unbiased estimation at huge N (height, schizophrenia, BMI) rather than low variance estimation at low or medium N. We excluded phenotypes with h2 Z-score (h2 / se(h2)) below 6 in our partitioned h2 paper; it looks like ICV is around ~3-4.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/31#issuecomment-85164119.

bulik commented 9 years ago

If the cohorts included in the meta-analysis used slightly different trait definitions that have (a) approx equal h2 and (b) genetic correlations slightly less than one, then h2(meta-analysis) will tend to be a bit less than h2 in any of the cohorts. The magnitude of this effect depends on how low the genetic correlations are, but a typical number would be max a couple of % (e.g., that's an upper bound for even schizophrenia, which is not the cleanest of phenotypes to measure).

Back to ENGIMA -- 0.54 se 0.23 is not significantly different from 0.26. However, 0.54 is a really high SNP-h2. I think height holds the record with ~0.45; everything else is in the ~0.1-0.3 range. Of all the numbers you listed, the ~0.26 number from LD Score regression w/ constrained intercept seems most plausible.

r03ert0 commented 9 years ago

thank you !

In the latest GIANT paper (Wood et al) snp-h2 was 60%! There was also Gaugler's paper for ASD with snp-h2=.51±0.05 and Davies's paper for IQ with snp-h2=.51±.11... One day we'll have those nice N~10K samples in brain imaging :)

On Mon, Mar 23, 2015 at 9:32 PM, Brendan Bulik-Sullivan < notifications@github.com> wrote:

If the cohorts included in the meta-analysis used slightly different trait definitions that have (a) approx equal h2 and (b) genetic correlations slightly less than one, then h2(meta-analysis) will tend to be a bit less than h2 in any of the cohorts. The magnitude of this effect depends on how low the genetic correlations are, but a typical number would be max a couple of % (e.g., that's an upper bound for even schizophrenia, which is not the cleanest of phenotypes to measure).

Back to ENGIMA -- 0.54 se 0.23 is not significantly different from 0.26. However, 0.54 is a really high SNP-h2. I think height holds the record with ~0.45; everything else is in the ~0.1-0.3 range. Of all the numbers you listed, the ~0.26 number from LD Score regression w/ constrained intercept seems most plausible.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/31#issuecomment-85181516.