bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
642 stars 342 forks source link

why are the genetic correlation out of the bound of [-1, 1] and have very large se? #89

Open chenyan53535 opened 7 years ago

chenyan53535 commented 7 years ago

Hi, I am trying use your software to calculate the genetic correlations between four phenotypes, all correlation have a very large se (about equal to rg), I did not know why. And some correlations are out of [-1, 1], it is too strange. The command I used as follows: /software/biosoft/software/centos6/limixinstall/bin/python ldsc.py --ref-ld-chr eur_w_ld_chr/ --out ebc_hc_sc_ec --rg ebc1.sumstats.gz,hc1.sumstats.gz,sc1.sumstats.gz,ec1.sumstats.gz --w-ld-chr eur_w_ld_chr/ Could you please help me to solve this problem?

rkwalters commented 7 years ago

Hello, LD score regression is not a bounded estimator, so it can produce estimates outside of [-1,1] due to sampling variation. Unless your genetic correlation estimates are significantly outside those bounds I wouldn’t worry too much.

SEs on the genetic correlation are loosely a function of the sample sizes and heritabilities of of the GWASs. If the sample size is small and/or the heritability is low (or if there’s an issue in munge_sumstats.py leaving you with way less than 1 million markers) you’ll have large SEs. Unless it’s a munging issue there not really much to be done to solve this, it’s just indication that you don’t have much power to estimate genetic correlation with your current datasets.

Cheers, Raymond

On Aug 29, 2017, at 3:45 AM, chenyan53535 notifications@github.com wrote:

Hi, I am trying use your software to calculate the genetic correlations between four phenotypes, all correlation have a very large se (about equal to rg), I did not know why. And some correlations are out of [-1, 1], it is too strange. The command I used as follows: /software/biosoft/software/centos6/limixinstall/bin/python ldsc.py --ref-ld-chr eur_w_ld_chr/ --out ebc_hc_sc_ec --rg ebc1.sumstats.gz,hc1.sumstats.gz,sc1.sumstats.gz,ec1.sumstats.gz --w-ld-chr eur_w_ld_chr/ Could you please help me to solve this problem?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bulik/ldsc/issues/89, or mute the thread https://github.com/notifications/unsubscribe-auth/AILEvUkmdFPIRKWDfX7H9EiStP3Rey4iks5sc8GqgaJpZM4PFgEu.

lijinxi90 commented 6 years ago

Hi Raymond,

I calculated the genetic between two traits, /ldsc.py --rg trait1.sumstats.gz,trait2.sumstats.gz --ref-ld-chr --w-ld-chr --out

the results as following:

p1 | p2 | rg | se | z | p | h2_obs | h2_obs_se | h2_int | h2_int_se | gcov_int | gcov_int_se trait1 | trait2 | 1.7018 | 3.0184 | 0.5638 | 0.5729 | 0.2794 | 0.1609 | 1.0234 | 0.0069 | 0.3576 | 0.0053

rg is 1.7018 and gcov_int is 0.35796, very large. I used the entire overlap population to calcluatate the genetic correlation. we collected two traits in the same individual. So could I estimate the genetic correlation between them? and how to deal with the out of the bound [-1, 1]. Thanks very much!

Best, Jinxi

rkwalters commented 6 years ago

Hi Jinxi,

As mentioned before, the LD score regression estimate of rg isn't bounded, so it generally isn't concerning unless it's significantly outside [-1,1]. In your case, the SE is extremely large (3.02!) so the message here is simply that the result is unstable.

Generally the heritability results for each trait are a good indicator for whether it's going to be possible to get stable rg estimates involving that trait. In your case, the results here indicate that the h2 results for trait2 aren't significant and are fairly noisy (observed scale h2=.28, se=.16). Genetic correlation results aren't likely to be stable enough to be meaningful when the heritability result for either trait isn't significant.

As far as next step, it could be useful to evaluate why the ldsc h2 results are unstable. If they are due to low sample size then there's probably not much you can do, but if they are due to loci with extremely large effect sizes more stable estimates can sometime be achieved by excluding those loci from the summary statistics (with the caution that the interpretation now involves that exclusion).

Since you have both traits measured in the sample individuals, methods using that individual-level data may also provide more stable estimates than ldsc's use of summary statistics. Specifically, you might look into bivariate GREML in GCTA http://cnsgenomics.com/software/gcta/#BivariateGREMLanalysis.

Cheers, Raymond

On Jun 20, 2018, at 3:26 AM, lijinxi90 notifications@github.com wrote:

Hi Raymond,

I calculated the genetic between two traits, /ldsc.py --rg trait1.sumstats.gz,trait2.sumstats.gz --ref-ld-chr --w-ld-chr --out

the results as following:

p1 | p2 | rg | se | z | p | h2_obs | h2_obs_se | h2_int | h2_int_se | gcov_int | gcov_int_se trait1 | trait2 | 1.7018 | 3.0184 | 0.5638 | 0.5729 | 0.2794 | 0.1609 | 1.0234 | 0.0069 | 0.3576 | 0.0053

rg is 1.7018 and gcov_int is 0.35796, very large. I used the entire overlap population to calcluatate the genetic correlation. we collected two traits in the same individual. So could I estimate the genetic correlation between them? and how to deal with the out of the bound [-1, 1]. Thanks very much!

Best, Jinxi

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bulik/ldsc/issues/89#issuecomment-398651208, or mute the thread https://github.com/notifications/unsubscribe-auth/AILEvWYuKaLjAOxVIqpu37CcSQAhuFifks5t-fkKgaJpZM4PFgEu.