bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
652 stars 344 forks source link

rg was out of bounds. #255

Open montenegrina opened 3 years ago

montenegrina commented 3 years ago

Hello,

I am running LD score regression for the first time and I am completely new in these types of analyses. I have 3 GWASes (EDIC, GOKIND, UKB) where I tested the same binary phenotype (let’s call it PDR) in each of them.I also did META of 3 of them. I am trying to determine the genetic correlation between binary phenotype tested in one cohort versus in another one. My analyses scheme looks as following:

corr(EDIC,GOKIND) corr(GOKIND,UKB) corr(EDIC,UKB) corr(EDIC,META) corr(GOKIND,META) corr(UKB,META)

EDIC (158 cases, 1053 controls) GOKIND (684 cases, 679 controls) UKB (2332 cases, 14680 controls)

First I was running munge_sumstats.py for these 3 cohorts. I have imbalance between cases and controls for 2 cohorts (EDIC, UKB). I read on some discussion thread here (dated a few years back) that when there is an imbalance I should use the effective size: Neff=4/(1/Ncases +1/Ncontrols) with --samp-prev 0.5.I tried to use it but I got this: munge_sumstats.py: error: unrecognized arguments: --samp-prev 0.5 So I specified only Neff for N for cohorts with case/control imbalance. Does that help with the case/control imbalance issue and should I do something else instead?

I am trying to understand the output from these correlation analyses (see below).

All cohorts except EDIC have a low mean Chi^2 and the ratio is not calculated (due to the low Chi^2 ). But for EDIC H2 is negative, what does that imply? The genetic correlation is not calculated; nan (nan) (h2 out of bounds) for any of cohort’s combinations. A message is given at the end of the output indicating that perhaps an error occurred during data-munging (which I did not notice) or that the h2 or N is low. Perhaps this could be attributed to the low N of the cases, or low number of cases+ controls? What is the advisable number of subjects so that LDSC give meaningful results?

Does these low mean Chi^2 means there is a high correlation between data sets? Perhaps I am overlooking important details that was not taken care of in the analyses. I would greatly appreciate your thoughts on the results. My apologies for what perhaps is a beginners question.

Analysis are these:

For EDIC I did this (the rest are done in the same fashion.)

python munge_sumstats.py \ --sumstats EDIC.GWAS.txt \ --N 550 \ --ignore Z \ --out edic

EDIC Metadata: Mean chi^2 = 1.02 Lambda GC = 1.023 Max chi^2 = 24.095

GOKIND Metadata: Mean chi^2 = 0.979 WARNING: mean chi^2 may be too small. Lambda GC = 0.99 Max chi^2 = 22.458

UKB Metadata: Mean chi^2 = 0.987 WARNING: mean chi^2 may be too small. Lambda GC = 0.99 Max chi^2 = 29.435 3 Genome-wide significant SNPs (some may have been removed by filtering).

I also did meta analysis of those 3 cohorts and munged that, using for N, Neff for all 3 cohorts which would be 9962.

META Metadata: Mean chi^2 = 0.996 WARNING: mean chi^2 may be too small. Lambda GC = 0.982 Max chi^2 = 24.553

I should mention that I didn’t get any warning messages or errors.

To perform genomic correlation

between EDIC and GOKIND

python ldsc.py \ --rg edic.sumstats.gz,gokind.sumstats.gz \ --ref-ld-chr eur_w_ld_chr/ \ --w-ld-chr eur_w_ld_chr/ \ --out edic_gokind less edic_gokind.log

Heritability of phenotype 1

Total Observed scale h2: -0.4499 (0.9321) Lambda GC: 1.0255 Mean Chi^2: 1.0207 Intercept: 1.0261 (0.0084) Ratio: 1.2625 (0.4054)

Heritability of phenotype 2/2

Total Observed scale h2: 0.1879 (0.3554) Lambda GC: 0.9868 Mean Chi^2: 0.9759 Intercept: 0.9702 (0.0078) Ratio: NA (mean chi^2 < 1)

Genetic Covariance

Total Observed scale gencov: -0.4259 (0.3342) Mean z1*z2: -0.0028 Intercept: 0.0053 (0.0048)

Genetic Correlation

Genetic Correlation: nan (nan) (h2 out of bounds) Z-score: nan (nan) (h2 out of bounds) P: nan (nan) (h2 out of bounds) WARNING: One of the h2's was out of bounds. This usually indicates a data-munging error or that h2 or N is low.

Between EDIC and UKB

Heritability of phenotype 1

Total Observed scale h2: -0.3509 (1.0793) Lambda GC: 1.0225 Mean Chi^2: 1.0227 Intercept: 1.0272 (0.0109) Ratio: 1.2004 (0.4808)

Heritability of phenotype 2/2

Total Observed scale h2: 0.1806 (0.0629) Lambda GC: 0.9957 Mean Chi^2: 0.9926 Intercept: 0.9588 (0.0094) Ratio: NA (mean chi^2 < 1)

Genetic Covariance

Total Observed scale gencov: -0.0139 (0.1619) Mean z1*z2: -0.0012 Intercept: -0.0004 (0.0066)

Genetic Correlation

Genetic Correlation: nan (nan) (h2 out of bounds) Z-score: nan (nan) (h2 out of bounds) P: nan (nan) (h2 out of bounds) WARNING: One of the h2's was out of bounds. This usually indicates a data-munging error or that h2 or N is low.

Between GOKIND and UKB

ERROR computing rg for phenotype 2/2, from file ukb.sumstats.gz. Traceback (most recent call last): File "/projects/com_grassim/anamaria/anamaria/anamaria/herit/ldsc/ldscore/sumstats.py", line 410, in estimate_rg rghat = _rg(loop, args, log, M_annot, ref_ld_cnames, w_ld_cname, i) File "/projects/com_grassim/anamaria/anamaria/anamaria/herit/ldsc/ldscore/sumstats.py", line 539, in _rg intercept_gencov=intercepts[2], n_blocks=n_blocks, twostep=args.two_step) File "/projects/com_grassim/anamaria/anamaria/anamaria/herit/ldsc/ldscore/regressions.py", line 705, in init np.multiply(hsq1.tot_delete_values, hsq2.tot_delete_values)) FloatingPointError: invalid value encountered in sqrt

Between EDIC and META

Heritability of phenotype 1

Total Observed scale h2: -0.3509 (1.0793) Lambda GC: 1.0225 Mean Chi^2: 1.0227 Intercept: 1.0272 (0.0109) Ratio: 1.2004 (0.4808)

Heritability of phenotype 2/2

Total Observed scale h2: 0.0309 (0.0477) Lambda GC: 0.9868 Mean Chi^2: 0.9939 Intercept: 0.9867 (0.0091) Ratio: NA (mean chi^2 < 1)

Genetic Covariance

Total Observed scale gencov: 0.6872 (0.0654) Mean z1*z2: -0.0295 Intercept: -0.0672 (0.0041)

Genetic Correlation

Genetic Correlation: nan (nan) (h2 out of bounds) Z-score: nan (nan) (h2 out of bounds) P: nan (nan) (h2 out of bounds) WARNING: One of the h2's was out of bounds. This usually indicates a data-munging error or that h2 or N is low.

Between GOKIND and META

Heritability of phenotype 1

Total Observed scale h2: -0.1594 (0.4902) Lambda GC: 1.0225 Mean Chi^2: 1.0227 Intercept: 1.0272 (0.0109) Ratio: 1.2004 (0.4808)

Heritability of phenotype 2/2

Total Observed scale h2: 0.0309 (0.0477) Lambda GC: 0.9868 Mean Chi^2: 0.9939 Intercept: 0.9867 (0.0091) Ratio: NA (mean chi^2 < 1)

Genetic Covariance

Total Observed scale gencov: 0.4631 (0.044) Mean z1*z2: -0.0295 Intercept: -0.0672 (0.0041)

Genetic Correlation

Genetic Correlation: nan (nan) (h2 out of bounds) Z-score: nan (nan) (h2 out of bounds) P: nan (nan) (h2 out of bounds) WARNING: One of the h2's was out of bounds. This usually indicates a data-munging error or that h2 or N is low.

Between UKB and META

Heritability of phenotype 1

Total Observed scale h2: 0.1806 (0.0629) Lambda GC: 0.9957 Mean Chi^2: 0.9926 Intercept: 0.9588 (0.0094) Ratio: NA (mean chi^2 < 1)

Heritability of phenotype 2/2

Total Observed scale h2: 0.0309 (0.0477) Lambda GC: 0.9868 Mean Chi^2: 0.9939 Intercept: 0.9867 (0.0091) Ratio: NA (mean chi^2 < 1)

Genetic Covariance

Total Observed scale gencov: 0.2356 (0.0244) Mean z1*z2: -0.0355 Intercept: -0.0846 (0.0057)

Genetic Correlation

Genetic Correlation: nan (nan) (rg out of bounds) Z-score: nan (nan) (rg out of bounds) P: nan (nan) (rg out of bounds) WARNING: rg was out of bounds. This often means that h2 is not significantly different from zero.

Heritability analysis

for UKB

python ldsc.py \ --h2 ukb.sumstats.gz \ --pop-prev 0.1 \ --samp-prev 0.5 \ --ref-ld-chr eur_w_ld_chr/ \ --w-ld-chr eur_w_ld_chr/ \ --out ukb_h2

Total Liability scale h2: 0.226 (0.0646) Lambda GC: 0.9957 Mean Chi^2: 0.9919 Intercept: 0.952 (0.0094) Ratio: NA (mean chi^2 < 1)

for META

Total Liability scale h2: 0.0325 (0.05) Lambda GC: 0.9868 Mean Chi^2: 0.9939 Intercept: 0.9867 (0.0091) Ratio: NA (mean chi^2 < 1)

for EDIC

Total Liability scale h2: -0.148 (0.8651) Lambda GC: 1.0255 Mean Chi^2: 1.0208 Intercept: 1.0224 (0.0072) Ratio: 1.077 (0.3449)

for GOKIND, balanced therefore I didn’t use --samp-prev

python ldsc.py \ --h2 gokind.sumstats.gz \ --ref-ld-chr eur_w_ld_chr/ \ --w-ld-chr eur_w_ld_chr/ \ --out gokind_h2

Total Observed scale h2: 0.1436 (0.3549) Lambda GC: 0.9868 Mean Chi^2: 0.9755 Intercept: 0.9712 (0.0077) Ratio: NA (mean chi^2 < 1)

liyulan321 commented 3 years ago

Hello, I now have a problem similar to yours. Have you solved your problem?

rhoshi commented 3 years ago

Same issue here. Have you solved yours?

anbai106 commented 3 years ago

Similar issues here -- running h2 estimate to previous AD summary stats gave very low h2 estimate and large std. I don't see any bugs in the data...

Captain-Pam commented 2 years ago

Similar issues here -- running h2 estimate to previous AD summary stats gave very low h2 estimate and large std. I don't see any bugs in the data...

Hi,I encountered the same problem as you, how did you solve it later? Is it still possible to continue to calculate genetic correlations for large SDs?