cuelee / pleio

17 stars 6 forks source link

environmental correlation #3

Open sevi2018 opened 4 years ago

sevi2018 commented 4 years ago

Hi, sorry for so many question, but I am really interested in your method. I wanted to double check with you the intercept you refer to in the paper and make sure my interpretation is correct. below is the output of the genetic correlation and heritability analysis I did run with LDSC. [EXAMPLE.docx](https://github.com/hanlab-SNU/pleio/files/5145727/EXAMPLE.docx

sevi2018 commented 4 years ago

Can you confirm that the intercept is the one I have highlighted? If I apply that formula and I consider the highlighted intercept I get values ? + or - 1. And my understanding is that as the genetic correlation the environmental has values (0 -1) Please let me know if I have been clear explaining the issue reported.

cuelee commented 4 years ago

I think you did not fully understand how to estimate the environmental correlation described in the manuscript. In the manuscript(https://www.biorxiv.org/content/10.1101/2020.06.16.155879v2), we proposed two methods by which the environmental correlation between two traits can be estimated. These two methods give similar estimates (not identical) and estimates are in the range of [-1,1].

Please note that these two methods use different intercept estimates: 1. intercept of genetic covariance estimate, of which you highlighted in the MS-Word document, 2. intercept of the single-trait heritability estimate.

Method 1. Using LDSC –rg flag (Recommended) In this case, the environmental correlation is the intercept estimate of the Genetic Covariance (Which is the highlighted value in the word document) using the LDSC –rg flag.

Method 2. Using LDSC –h2 flag This is a multi-step process that gives an correlation estimate similar to the first method(See the supplementary Note for details: https://www.biorxiv.org/content/10.1101/2020.06.16.155879v2.supplementary-material): Step 1: You integrate summary statistics of a pair of traits using a weighted sum of z-scores, using the squared inverse of the sample number (1 / sqrt (N_A) and 1 / sqrt (N_B)) as weights. Step 2: You apply the LDSC-h2 flag to the summary statistics generated from Step 1 and get the intercept of the LDSC with a characteristic (using the --h2 flag). If we refer to the intercept as alpha_sum, the environmental correlation is (alpha_sum – 1)(N_A+N_B)/(2sqrt(N_A)*sqrt(N_B))

Example: Applying method 1, you will get an LDSC log file similar to the following:

*********************************************************************
* LD Score Regression (LDSC)
* Version 1.0.1
* (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call:
./ldsc.py \
--ref-ld-chr /data01/cuelee/project/reg/simulation/delpy/ldsc-master/eur_w_ld_chr/ \
--out /data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/__sg__/6150_1.gwas.imputed_v3.both_sexes_HEARTFAIL.gwas.imputed_v3.both_sexes \
--rg /data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/__icor__/6150_1.gwas.imputed_v3.both_sexes.sumstats.gz,/data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/__icor__/HEARTFAIL.gwas.imputed_v3.both_sexes.sumstats.gz \
--pop-prev 0.02353662830457021,0.003905066576780008 \
--samp-prev 0.02353662830457021,0.003905066576780008 \
--w-ld-chr /data01/cuelee/project/reg/simulation/delpy/ldsc-master/eur_w_ld_chr/

Beginning analysis at Wed Apr 22 09:40:15 2020
Reading summary statistics from /data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/__icor__/6150_1.gwas.imputed_v3.both_sexes.sumstats.gz ...
Read summary statistics for 1182128 SNPs.
Reading reference panel LD Score from /data01/cuelee/project/reg/simulation/delpy/ldsc-master/eur_w_ld_chr/[1-22] ...
Read reference panel LD Scores for 1290028 SNPs.
Removing partitioned LD Scores with zero variance.
Reading regression weight LD Score from /data01/cuelee/project/reg/simulation/delpy/ldsc-master/eur_w_ld_chr/[1-22] ...
Read regression weight LD Scores for 1290028 SNPs.
After merging with reference panel LD, 1161774 SNPs remain.
After merging with regression SNP LD, 1161774 SNPs remain.
Computing rg for phenotype 2/2
Reading summary statistics from /data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/__icor__/HEARTFAIL.gwas.imputed_v3.both_sexes.sumstats.gz ...
Read summary statistics for 1217311 SNPs.
After merging with summary statistics, 1161774 SNPs remain.
1161774 SNPs with valid alleles.

Heritability of phenotype 1
---------------------------
Total Liability scale h2: 0.1411 (0.0123)
Lambda GC: 1.1009
Mean Chi^2: 1.1325
Intercept: 0.9993 (0.0074)
Ratio < 0 (usually indicates GC correction).

Heritability of phenotype 2/2
-----------------------------
Total Liability scale h2: 0.1633 (0.0353)
Lambda GC: 1.0434
Mean Chi^2: 1.0411
Intercept: 1. (0.0064)
Ratio: 0.0004 (0.1554)

Genetic Covariance
------------------
Total Liability scale gencov: 0.0367 (0.0147)
Mean z1*z2: 0.1408
Intercept: 0.124 (0.0049)

Genetic Correlation
-------------------
Genetic Correlation: 0.242 (0.0964)
Z-score: 2.5105
P: 0.0121

Summary of Genetic Correlation Results
p1                                                                                                                             p2     rg      se       z       p  h2_liab  h2_liab_se  h2_int  h2_int_se  gcov_int  gcov_int_se
/data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/__icor__/6150_1.gwas.imputed_v3.both_sexes.sumstats.gz  /data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/__icor__/HEARTFAIL.gwas.imputed_v3.both_sexes.sumstats.gz  0.242  0.0964  2.5105  0.0121   0.1633      0.0353     1.0     0.0064     0.124       0.0049

Analysis finished at Wed Apr 22 09:41:06 2020
Total time elapsed: 51.42s

Applying method 2, you will get an LDSC log file similar to the following:

*********************************************************************
* LD Score Regression (LDSC)
* Version 1.0.1
* (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call:
./ldsc.py \
--h2 /data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/__re__/6150_1.gwas.imputed_v3.both_sexes_HEARTFAIL.gwas.imputed_v3.both_sexes.sumstats.gz \
--ref-ld-chr /data01/cuelee/project/reg/simulation/delpy/ldsc-master/eur_w_ld_chr/ \
--out /data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/__re__/6150_1.gwas.imputed_v3.both_sexes_HEARTFAIL.gwas.imputed_v3.both_sexes \
--w-ld-chr /data01/cuelee/project/reg/simulation/delpy/ldsc-master/eur_w_ld_chr/

Beginning analysis at Fri Jun  5 09:48:12 2020
Reading summary statistics from /data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/__re__/6150_1.gwas.imputed_v3.both_sexes_HEARTFAIL.gwas.imputed_v3.both_sexes.sumstats.gz ...
Read summary statistics for 1182128 SNPs.
Reading reference panel LD Score from /data01/cuelee/project/reg/simulation/delpy/ldsc-master/eur_w_ld_chr/[1-22] ...
Read reference panel LD Scores for 1290028 SNPs.
Removing partitioned LD Scores with zero variance.
Reading regression weight LD Score from /data01/cuelee/project/reg/simulation/delpy/ldsc-master/eur_w_ld_chr/[1-22] ...
Read regression weight LD Scores for 1290028 SNPs.
After merging with reference panel LD, 1161774 SNPs remain.
After merging with regression SNP LD, 1161774 SNPs remain.
Using two-step estimator with cutoff at 30.
Total Observed scale h2: 0.0073 (0.0008)
Lambda GC: 1.2143
Mean Chi^2: 1.2276
Intercept: 1.1247 (0.0077)
Ratio: 0.5481 (0.0338)
Analysis finished at Fri Jun  5 09:48:36 2020
Total time elapsed: 24.35s

In this example, the number of samples for traits A and B is the same (N_A = N_B). For Method 1, the environmental correlation estimate is 0.124(from Intercept: 0.124 (0.0049)). For Method 2, the environmental correlation estimate is 1.247 - 1 = 0.1247 := 0.124(from Intercept: 1.1247 (0.0077)).

Hope the answers help you understand. Please let me know if you have more questions.

sevi2018 commented 4 years ago

Thank you., Now it is clear.

On Mon, Aug 31, 2020 at 2:30 AM Cue Hyunkyu Lee notifications@github.com wrote:

_I think you did not fully understand how to estimate the environmental correlation described in the manuscript. In the manuscript( https://www.biorxiv.org/content/10.1101/2020.06.16.155879v2), we proposed two methods by which the environmental correlation between two traits can be estimated. These two methods give similar estimates (not identical) and estimates are in the range of [-1,1].

Please note that these two methods use different intercept estimates: 1. intercept of genetic covariance estimate, of which you highlighted in the MS-Word document, 2. intercept of the single-trait heritability estimate.

Method 1. Using LDSC –rg flag (Recommended) In this case, the environmental correlation is the intercept estimate of the Genetic Covariance (Which is the highlighted value in the word document) using the LDSC –rg flag.

Method 2. Using LDSC –h2 flag This is a multi-step process that gives an correlation estimate similar to the first method(See the supplementary Note for details: https://www.biorxiv.org/content/10.1101/2020.06.16.155879v2.supplementary-material ): Step 1: You integrate summary statistics of a pair of traits using a weighted sum of z-scores, using the squared inverse of the sample number (1 / sqrt (N_A) and 1 / sqrt (N_B)) as weights. Step 2: You apply the LDSC-h2 flag to the summary statistics generated from Step 1 and get the intercept of the LDSC with a characteristic (using the --h2 flag). If we refer to the intercept as alpha_sum, the environmental correlation is (alpha_sum – 1)(N_A+N_B)/(2 sqrt(N_A)*sqrt(N_B))

Example: Applying method 1, you will get an LDSC log file similar to the following:


  • LD Score Regression (LDSC)

  • Version 1.0.1

  • (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane

  • Broad Institute of MIT and Harvard / MIT Department of Mathematics

  • GNU General Public License v3


Call:

./ldsc.py \

--ref-ld-chr /data01/cuelee/project/reg/simulation/delpy/ldsc-master/eur_w_ld_chr/ \

--out /data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/sg/6150_1.gwas.imputed_v3.both_sexes_HEARTFAIL.gwas.imputed_v3.both_sexes \

--rg /data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/icor/6150_1.gwas.imputed_v3.both_sexes.sumstats.gz,/data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/icor/HEARTFAIL.gwas.imputed_v3.both_sexes.sumstats.gz \

--pop-prev 0.02353662830457021,0.003905066576780008 \

--samp-prev 0.02353662830457021,0.003905066576780008 \

--w-ld-chr /data01/cuelee/project/reg/simulation/delpy/ldsc-master/eur_w_ld_chr/

Beginning analysis at Wed Apr 22 09:40:15 2020

Reading summary statistics from /data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/icor/6150_1.gwas.imputed_v3.both_sexes.sumstats.gz ...

Read summary statistics for 1182128 SNPs.

Reading reference panel LD Score from /data01/cuelee/project/reg/simulation/delpy/ldsc-master/eur_w_ld_chr/[1-22] ...

Read reference panel LD Scores for 1290028 SNPs.

Removing partitioned LD Scores with zero variance.

Reading regression weight LD Score from /data01/cuelee/project/reg/simulation/delpy/ldsc-master/eur_w_ld_chr/[1-22] ...

Read regression weight LD Scores for 1290028 SNPs.

After merging with reference panel LD, 1161774 SNPs remain.

After merging with regression SNP LD, 1161774 SNPs remain.

Computing rg for phenotype 2/2

Reading summary statistics from /data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/icor/HEARTFAIL.gwas.imputed_v3.both_sexes.sumstats.gz ...

Read summary statistics for 1217311 SNPs.

After merging with summary statistics, 1161774 SNPs remain.

1161774 SNPs with valid alleles.

Heritability of phenotype 1


Total Liability scale h2: 0.1411 (0.0123)

Lambda GC: 1.1009

Mean Chi^2: 1.1325

Intercept: 0.9993 (0.0074)

Ratio < 0 (usually indicates GC correction).

Heritability of phenotype 2/2


Total Liability scale h2: 0.1633 (0.0353)

Lambda GC: 1.0434

Mean Chi^2: 1.0411

Intercept: 1. (0.0064)

Ratio: 0.0004 (0.1554)

Genetic Covariance


Total Liability scale gencov: 0.0367 (0.0147)

Mean z1*z2: 0.1408

Intercept: 0.124 (0.0049)

Genetic Correlation


Genetic Correlation: 0.242 (0.0964)

Z-score: 2.5105

P: 0.0121

Summary of Genetic Correlation Results

p1 p2 rg se z p h2_liab h2_liab_se h2_int h2_int_se gcov_int gcov_int_se

/data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/icor/6150_1.gwas.imputed_v3.both_sexes.sumstats.gz /data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/icor/HEARTFAIL.gwas.imputed_v3.both_sexes.sumstats.gz 0.242 0.0964 2.5105 0.0121 0.1633 0.0353 1.0 0.0064 0.124 0.0049

Analysis finished at Wed Apr 22 09:41:06 2020

Total time elapsed: 51.42s

Applying method 2, you will get an LDSC log file similar to the following:


  • LD Score Regression (LDSC)

  • Version 1.0.1

  • (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane

  • Broad Institute of MIT and Harvard / MIT Department of Mathematics

  • GNU General Public License v3


Call:

./ldsc.py \

--h2 /data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/re/6150_1.gwas.imputed_v3.both_sexes_HEARTFAIL.gwas.imputed_v3.both_sexes.sumstats.gz \

--ref-ld-chr /data01/cuelee/project/reg/simulation/delpy/ldsc-master/eur_w_ld_chr/ \

--out /data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/re/6150_1.gwas.imputed_v3.both_sexes_HEARTFAIL.gwas.imputed_v3.both_sexes \

--w-ld-chr /data01/cuelee/project/reg/simulation/delpy/ldsc-master/eur_w_ld_chr/

Beginning analysis at Fri Jun 5 09:48:12 2020

Reading summary statistics from /data01/cuelee/project/reg/simulation/04_realdata_analysis/data_cvd/re/6150_1.gwas.imputed_v3.both_sexes_HEARTFAIL.gwas.imputed_v3.both_sexes.sumstats.gz ...

Read summary statistics for 1182128 SNPs.

Reading reference panel LD Score from /data01/cuelee/project/reg/simulation/delpy/ldsc-master/eur_w_ld_chr/[1-22] ...

Read reference panel LD Scores for 1290028 SNPs.

Removing partitioned LD Scores with zero variance.

Reading regression weight LD Score from /data01/cuelee/project/reg/simulation/delpy/ldsc-master/eur_w_ld_chr/[1-22] ...

Read regression weight LD Scores for 1290028 SNPs.

After merging with reference panel LD, 1161774 SNPs remain.

After merging with regression SNP LD, 1161774 SNPs remain.

Using two-step estimator with cutoff at 30.

Total Observed scale h2: 0.0073 (0.0008)

Lambda GC: 1.2143

Mean Chi^2: 1.2276

Intercept: 1.1247 (0.0077)

Ratio: 0.5481 (0.0338)

Analysis finished at Fri Jun 5 09:48:36 2020

Total time elapsed: 24.35s

In this example, the number of samples for traits A and B is the same (N_A = NB). For Method 1, the environmental correlation estimate is 0.124(from Intercept: 0.124 (0.0049)). For Method 2, the environmental correlation estimate is 1.247 - 1 = 0.1247 := 0.124(from Intercept: 1.1247 (0.0077))

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hanlab-SNU/pleio/issues/3#issuecomment-683587492, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJHJD3NZEGQ5ENXVXBDHJLDSDM7RRANCNFSM4QPDUDPQ .