bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
652 stars 345 forks source link

Genetic correlation significantly out of bounds [-1,1], with relatively small SE's #189

Open yuj1r0 opened 4 years ago

yuj1r0 commented 4 years ago

Hi,

I am running ldsc with two summary statistics of trait1 and proxy for trait1 with completely non-overlapping european study samples (N1 = 900k (publishded meta-analysis), N2 = 180k (one sample). Looking at the manhattan plots, the results seem quite consistent.

Munged and merged with hm3 snps, both traits have a significant h2. However, the rg estimate gets out of bounds (=1.23, se=0.06, p~e-82).

I am aware of the fact that that correlation estimates from ldsc can be out of bounds [-1,1]. You have also stated: ""LD score regression is not a bounded estimator, so it can produce estimates outside of [-1,1] due to sampling variation. Unless your genetic correlation estimates are significantly outside those bounds I wouldn’t worry too much." (https://github.com/bulik/ldsc/issues/89)

I am quite insecure whether results like this are considered to be "significantly" out of bounds and whether there is a lot to worry about this. Should some measures of correction be taken?

(Btw, I also tried to constrain the intercept, just to see how it behaves, this resulted in a correlation of 0.90)

Thank you in advance!!!

best, Tuomo

PS. Here is the output. (the A1 and A2 are reversed in the summary stats 1 and 2, resulting in a negative correlation).

" Heritability of phenotype 1

Total Observed scale h2: 0.0357 (0.0049) Lambda GC: 1.207 Mean Chi^2: 1.2766 Intercept: 1.148 (0.0138) Ratio: 0.535 (0.0499) Heritability of phenotype 2/2

Total Observed scale h2: 0.1942 (0.0099) Lambda GC: 1.6259 Mean Chi^2: 1.963 Intercept: 1.0834 (0.0252) Ratio: 0.0866 (0.0262) Genetic Covariance

Total Observed scale gencov: -0.1026 (0.0058) Mean z1*z2: -0.4681 Intercept: -0.0642 (0.0132) Genetic Correlation

Genetic Correlation: nan (nan) (rg out of bounds) Z-score: nan (nan) (rg out of bounds) P: nan (nan) (rg out of bounds) WARNING: rg was out of bounds. This often means that h2 is not significantly different from zero. Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se /folder/sumstats1.sumstats.gz /folder/sumstats2.sumstats.gz -1.232 0.0643 -19.1693 6.6747e-82 0.1942 0.0099 1.0834 0.0252 -0.0642 0.0132 "

yuj1r0 commented 4 years ago

And to add: for a subset (the cases) of the 180 k population of the previous analyses I conducted an gwas of subgroup1 vs subgroup2 (Ntotal = 24 k). Then when I compared them to the summary statistics of N1 (=900 k), the ldsc resulted in an unstable estimate of 1.3. Constraining the intercept resulted in a correlation of 0.9 and p=e-5 (seems stable). How should this sort of behavior be interpreted. Intercept for trait 1 here was 1.0915 and for trait 2 it was 1.009. No sample overlap. Is it ok to constrain the intercept here?

rhoshi commented 3 years ago

I have the same issue here. Have you solved your problem?

as224 commented 8 months ago

We ran into the same issue. Did you find a solution?