bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
624 stars 339 forks source link

High genetic correlation, but large se and not significant, h2 is weirdly low too #328

Open anbai106 opened 2 years ago

anbai106 commented 2 years ago

Hi,

Thanks for this wonderful tool!

I am running a genetic correlation between brain volume (C256_55) and AD. I obtained very low h2 estimates for this brain volume (I ran GWAS with) and also AD (downloaded from GWAS Catalog). I also ran GCTA to estimate h2 for the brain volume, and I obtained: 0.135862 +- 0.026911.

Here is the output from ldsc:

*********************************************************************
* LD Score Regression (LDSC)
* Version 1.0.1
* (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call: 
./ldsc.py \
--ref-ld-chr /gpfs/fs001/cbica/home/wenju/Project/ldsc/pre-computed_Eu_GWAS/eur_w_ld_chr/ \
--out C256_55_ldsc_gc_AD \
--rg C256_55_ldsc.sumstats.gz,AD.sumstats.gz \
--w-ld-chr /gpfs/fs001/cbica/home/wenju/Project/ldsc/pre-computed_Eu_GWAS/eur_w_ld_chr/ 

Beginning analysis at Thu Nov  4 09:40:46 2021
Reading summary statistics from C256_55_ldsc.sumstats.gz ...
Read summary statistics for 1086617 SNPs.
Reading reference panel LD Score from /gpfs/fs001/cbica/home/wenju/Project/ldsc/pre-computed_Eu_GWAS/eur_w_ld_chr/[1-22] ... (ldscore_fromlist)
Read reference panel LD Scores for 1290028 SNPs.
Removing partitioned LD Scores with zero variance.
Reading regression weight LD Score from /gpfs/fs001/cbica/home/wenju/Project/ldsc/pre-computed_Eu_GWAS/eur_w_ld_chr/[1-22] ... (ldscore_fromlist)
Read regression weight LD Scores for 1290028 SNPs.
After merging with reference panel LD, 1078922 SNPs remain.
After merging with regression SNP LD, 1078922 SNPs remain.
Computing rg for phenotype 2/2
Reading summary statistics from AD.sumstats.gz ...
Read summary statistics for 1217311 SNPs.
After merging with summary statistics, 1078922 SNPs remain.
1049983 SNPs with valid alleles.

Heritability of phenotype 1
---------------------------
Total Observed scale h2: 0.0144 (0.0261)
Lambda GC: 1.0195
Mean Chi^2: 1.0168
Intercept: 1.0114 (0.0065)
Ratio: 0.6816 (0.388)

Heritability of phenotype 2/2
-----------------------------
Total Observed scale h2: 0.0405 (0.0228)
Lambda GC: 1.0926
Mean Chi^2: 1.1171
Intercept: 1.0709 (0.0345)
Ratio: 0.605 (0.2944)

Genetic Covariance
------------------
Total Observed scale gencov: 0.0233 (0.0111)
Mean z1*z2: 0.0058
Intercept: -0.0092 (0.0049)

Genetic Correlation
-------------------
Genetic Correlation: 0.9656 (1.1632)
Z-score: 0.8301
P: 0.4065

Summary of Genetic Correlation Results
p1              p2      rg      se       z       p  h2_obs  h2_obs_se  h2_int  h2_int_se  gcov_int  gcov_int_se
C256_55_ldsc.sumstats.gz  AD.sumstats.gz  0.9656  1.1632  0.8301  0.4065  0.0405     0.0228  1.0709     0.0345   -0.0092       0.0049

Analysis finished at Thu Nov  4 09:41:01 2021
Total time elapsed: 14.37s

This is the output from the munge_stat step:

*********************************************************************
* LD Score Regression (LDSC)
* Version 1.0.1
* (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call: 
./munge_sumstats.py \
--out /cbica/home/wenju/Dataset/UKBB/UKBB_genetic_preprocess/S4_GWAS/British_ancestry/C256_55_ldsc \
--merge-alleles /gpfs/fs001/cbica/home/wenju/Project/ldsc/pre-computed_Eu_GWAS/w_hm3.snplist \
--chunksize 50000 \
--sumstats /cbica/home/wenju/Dataset/UKBB/UKBB_genetic_preprocess/S4_GWAS/British_ancestry/C256_55_gwas_plink_ldsc.tsv 

Interpreting column names as follows:
N:  Sample size
A1: Allele 1, interpreted as ref allele for signed sumstat.
P:  p-Value
BETA:   [linear/logistic] regression coefficient (0 --> no effect; above 0 --> A1 is trait/risk increasing)
A2: Allele 2, interpreted as non-ref allele for signed sumstat.
SNP:    Variant ID (e.g., rs number)

Reading list of SNPs for allele merge from /gpfs/fs001/cbica/home/wenju/Project/ldsc/pre-computed_Eu_GWAS/w_hm3.snplist
Read 1217311 SNPs for allele merge.
Reading sumstats from /cbica/home/wenju/Dataset/UKBB/UKBB_genetic_preprocess/S4_GWAS/British_ancestry/C256_55_gwas_plink_ldsc.tsv into memory 50000 SNPs at a time.
Read 8430655 SNPs from --sumstats file.
Removed 7344000 SNPs not in --merge-alleles.
Removed 35 SNPs with missing values.
Removed 0 SNPs with INFO <= 0.9.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with out-of-bounds p-values.
Removed 1 variants that were not SNPs or were strand-ambiguous.
1086619 SNPs remain.
Removed 0 SNPs with duplicated rs numbers (1086619 SNPs remain).
Removed 0 SNPs with N < 11993.3333333 (1086619 SNPs remain).
Median value of BETA was 1.46891e-05, which seems sensible.
Removed 2 SNPs whose alleles did not match --merge-alleles (1086617 SNPs remain).
Writing summary statistics for 1217311 SNPs (1086617 with nonmissing beta) to /cbica/home/wenju/Dataset/UKBB/UKBB_genetic_preprocess/S4_GWAS/British_ancestry/C256_55_ldsc.sumstats.gz.

Metadata:
Mean chi^2 = 1.017
WARNING: mean chi^2 may be too small.
Lambda GC = 1.018
Max chi^2 = 22.606
0 Genome-wide significant SNPs (some may have been removed by filtering).

Conversion finished at Wed Nov  3 23:45:02 2021
Total time elapsed: 2.0m:2.15s

More information for the AD GWAS summary, I downloaded it from this here: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST002001-GCST003000/GCST002245/

The log for the mung_stat:

*********************************************************************
* LD Score Regression (LDSC)
* Version 1.0.1
* (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call: 
./munge_sumstats.py \
--out AD \
--merge-alleles /home/hao/Software/ldsc/pre-computed_Eu_GWAS/w_hm3.snplist \
--chunksize 500000 \
--N 55134.0 \
--sumstats IGAP_stage_1_ldsc.txt 

Interpreting column names as follows:
A1: Allele 1, interpreted as ref allele for signed sumstat.
P:  p-Value
Beta:   [linear/logistic] regression coefficient (0 --> no effect; above 0 --> A1 is trait/risk increasing)
A2: Allele 2, interpreted as non-ref allele for signed sumstat.
SNP:    Variant ID (e.g., rs number)

Reading list of SNPs for allele merge from /home/hao/Software/ldsc/pre-computed_Eu_GWAS/w_hm3.snplist
Read 1217311 SNPs for allele merge.
Reading sumstats from IGAP_stage_1_ldsc.txt into memory 500000 SNPs at a time.
WARNING: 2 SNPs had P outside of (0,1]. The P column may be mislabeled.
Read 7055881 SNPs from --sumstats file.
Removed 5905651 SNPs not in --merge-alleles.
Removed 0 SNPs with missing values.
Removed 0 SNPs with INFO <= 0.9.
Removed 0 SNPs with MAF <= 0.01.
Removed 2 SNPs with out-of-bounds p-values.
Removed 6 variants that were not SNPs or were strand-ambiguous.
1150222 SNPs remain.
Removed 0 SNPs with duplicated rs numbers (1150222 SNPs remain).
Using N = 55134.0
Median value of Beta was 0.0011, which seems sensible.
Removed 22 SNPs whose alleles did not match --merge-alleles (1150200 SNPs remain).
Writing summary statistics for 1217311 SNPs (1150200 with nonmissing beta) to AD.sumstats.gz.

Metadata:
Mean chi^2 = 1.114
Lambda GC = 1.093
Max chi^2 = 565.218
165 Genome-wide significant SNPs (some may have been removed by filtering).

Conversion finished at Thu Sep 16 18:59:31 2021
Total time elapsed: 1.0m:5.6s

Then I estimated the h2 for AD trait:

*********************************************************************
* LD Score Regression (LDSC)
* Version 1.0.1
* (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call: 
./ldsc.py \
--h2 AD.sumstats.gz \
--ref-ld-chr /home/hao/Software/ldsc/pre-computed_Eu_GWAS/eur_w_ld_chr/ \
--out AD_h2 \
--w-ld-chr /home/hao/Software/ldsc/pre-computed_Eu_GWAS/eur_w_ld_chr/ 

Beginning analysis at Fri Sep 17 13:38:57 2021
Reading summary statistics from AD.sumstats.gz ...
Read summary statistics for 1150200 SNPs.
Reading reference panel LD Score from /home/hao/Software/ldsc/pre-computed_Eu_GWAS/eur_w_ld_chr/[1-22] ... (ldscore_fromlist)
Read reference panel LD Scores for 1290028 SNPs.
Removing partitioned LD Scores with zero variance.
Reading regression weight LD Score from /home/hao/Software/ldsc/pre-computed_Eu_GWAS/eur_w_ld_chr/[1-22] ... (ldscore_fromlist)
Read regression weight LD Scores for 1290028 SNPs.
After merging with reference panel LD, 1145016 SNPs remain.
After merging with regression SNP LD, 1145016 SNPs remain.
Using two-step estimator with cutoff at 30.
Total Observed scale h2: 0.0676 (0.0126)
Lambda GC: 1.0926
Mean Chi^2: 1.1144
Intercept: 1.0416 (0.0076)
Ratio: 0.3638 (0.0661)
Analysis finished at Fri Sep 17 13:39:06 2021
Total time elapsed: 8.79s

Any insights into these results? My sample size is large enough: N > 18K

anbai106 commented 2 years ago

I am wondering if there are people maintaining this wonder software :)

lcpilling commented 2 years ago

I've been encountering this same issue recently (large rG values but wide SEs). The 'problem' is that if either or both traits have low heritability then the rG estimate is inherently unstable. In your case the H2 estimate for brain volume is not significantly different from zero (h2=0.0144, se=0.026, therefore p=0.58). It would be helpful if LDSC flagged this in the output more explicitly. In the munge step you get a warning that mean chi^2 may be too small but easy to miss. Does seem surprising it would not be significantly different to zero (quantitative trait in 18,000 people?). Not SNPs p<5*10-8 though?