bvilhjal / ldpred

MIT License
95 stars 58 forks source link

LD-score estimated heritabilities are not robust. #72

Closed babak-ra closed 4 years ago

babak-ra commented 5 years ago

LDpred uses h2 computed using LD-score regression's method for inference, and it has an important role. I have been playing with different values of --N for a CAD meta-analysis and the h2 I get varies significantly:

LD-score estimated genome-wide heritability: 0.0382  # N:  184304 (the correct sample size)
LD-score estimated genome-wide heritability: 0.0752  # N:  210000
LD-score estimated genome-wide heritability: 0.0929  # N:  225000
LD-score estimated genome-wide heritability: 0.1176  # N:  250000
LD-score estimated genome-wide heritability: 0.2290  # N:  500000
LD-score estimated genome-wide heritability: 0.2661  # N:  750000
LD-score estimated genome-wide heritability: 0.2846  # N: 1000000

Is this expected?

babak-ra commented 5 years ago

I should add that with everything being equal (set of SNPs, ref panel, etc), LD-score regression reports the following:

Total Observed scale h2: 0.0682 (0.0044) # N:   184,304 (the correct sample size)
Total Observed scale h2: 0.0126 (0.0008) # N: 1,000,000
marielohcs commented 5 years ago

Actually I think I might have a similar problem (reported the issue in another thread). I ran 2 similar analysis and the one with bigger sample size had heritability=Inf and the subsequent steps (ldpred) won't run, although the coord step was fine.

bvilhjal commented 5 years ago

Thanks for your comments on this. I apologize for the slow reply on this issue, but I was on holidays in April.

In the LD score regression equation, the sample size is a parameter. Hence, the heritability estimate is approximately linear in the sqrt(N). However, currently when the effect estimates are derived sample sizes defined at the coordination step are used (either read in the file, or parsed), but these are then replaced with the sample size given here for the h2 estimate. I believe it would be better not to do that, so I will leave this issue open until that's done.

babak-ra commented 5 years ago

Thanks so much for your response. I am still not sure why LD-score regression's heritability estimate is significantly higher than LDpred's.

bvilhjal commented 5 years ago

Hi,

It’s probably because LDpred calculates the LD scores itself and forces 1 as the intercept in the regression, etc.

Best, Bjarni

bvilhjal commented 4 years ago

This should be fixed now, in that it uses the sample size available in the sum stats by default now.