bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
614 stars 332 forks source link

Prop._h2 is negative #438

Open dqq0404 opened 2 weeks ago

dqq0404 commented 2 weeks ago

Hi, When I did the prtitioned heritability with 1000G_Phase3_baselineLD_v2.2_ldscores.tgz, the Prop._h2 was negative and it was very significant. Is this a bug?

aksarkar commented 2 weeks ago

@dqq0404 It would be helpful to have the complete output of ldsc. Without it, one cannot say much.

Did you run with --overlap-annot? This is required for the baseline LD model even though this is not explicitly written in the documentation anywhere, since the baseline annotations overlap.

dqq0404 commented 2 weeks ago

Hi, This is my input: ./ldsc.py \ --h2 ...sumstats.gz \ --ref-ld-chr baseline_v2.2/baselineLD. \ --out ...baseline \ --overlap-annot \ --frqfile-chr .../1000G_EUR_Phase3_plink/1000G.EUR.QC. \ --w-ld-chr 1000G_Phase3_weights_hm3_no_MHC/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC.

and this is my output that seemed wrong: Category Prop._SNPs Prop._h2 Prop._h2_std_error Enrichment Enrichment_std_error Enrichment_p MAF_Adj_Predicted_Allele_AgeL2_0 3.39243741662e-06 -0.387852505281 0.0452457387571 -114328.566057 13337.2360933 8.9865840571286534e-14 MAF_Adj_LLD_AFRL2_0 0.00279647423847 -0.291467372182 0.033175498992 -104.226732423 11.8633308098 8.7228613575755434e-14 MAF_Adj_ASMCL2_0 -2.34852700521e-14 -0.470331648975 0.03990867818 2.00266655623e+13 -1.69930676085e+12 4.5256915227493249e-24

aksarkar commented 2 weeks ago

@dqq0404 The "proportion of h^2" explained by a continuous annotation is not a sensible quantity by definition.

For continuous annotations, you can only draw conclusions from the estimated coefficient.

dqq0404 commented 2 weeks ago

Thanks for your reply. Can I ask how to calculate the estimated coefficient?

dqq0404 commented 2 weeks ago

Hi, I found there was a --print-coefficients parameter to get the coefficient. This question may be simple,but I still want to know what does it mean when the coefficient is positive and negative? Could you please explain it? Thanks!!

aksarkar commented 2 weeks ago

The interpretation of the coefficient is the amount that the per-SNP heritability increases when the annotation increases by one standard deviation.

Refer to Gazal et al. 2017 for more details.

dqq0404 commented 2 weeks ago

Hi, After I saw this paper, I found that the paper used Tau_star.coefficient to compare across annotations and across traits instead of Tau.coefficient. And I found there was a formula to calculate Tau_star.coefficient: https://github.com/bulik/ldsc/issues/270#issue-801688557 But I do not understand what the meaning of alphabet in this formula.How to calculate Tau_star.coefficient using following information? Prop._SNPs Prop._h2 Prop._h2_std_error Enrichment Enrichment_std_error Enrichment_p Coefficient Coefficient_std_error Coefficient_z-score

aksarkar commented 2 weeks ago

As stated in the methods section of Gazal et al. 2017:

M_{h_g^2} is the number of SNPs that were analyzed. You can get this from the printed output of ldsc.

h_g^2 is the estimated heritability. You can also get this from the printed output of ldsc.

sd_c is the standard deviation of the annotation. You need to read the .annot file and compute the standard deviation of the relevant column.

\hat{tau} is the column Coefficient in the output.

dqq0404 commented 1 week ago

If I understand it,

  1. I should combine v2.2 .annot file for 22 chromosomes and calculate the std for each category.This is using the all snps (about 10 millions)that in the 22 .annot files. or
  2. I should combine v2.2 .annot file and weights.hm3_noMHC file for 22 chromosomes respectively and merge with my summary statistics. Eventually, I probably get about 1 million snps and I use these snps to calculate the std for each category.

Which one should I choose?

aksarkar commented 1 week ago

You should choose (2). The effect size to be standardized only describes the SNPs that were used in the regression, that is, those SNPs present in both --w-ld and --ref-ld.

dqq0404 commented 1 week ago

Thank you for your patient answer that solve my confusion. I have another two question:

1.Should I use the tau coefficient_z-score to test the significance of Tau_star.coefficient instead of Enrichment_p? If I use former, the result is different with latter, how to interpret this?

2.How to calculate the chi^2 of a snp when doing partitioned heritability? Because I see snps are removed when the chi^2 > 80.

aksarkar commented 1 week ago
  1. Correct, you need to use the z-score of the coefficient to draw statistical conclusions. The reason they are different is that heritability enrichment does not account for the contribution of other annotations, whereas the coefficient does.

  2. The chi^2 statistic is the square of the z-score. https://en.wikipedia.org/wiki/Chi-squared_distribution#Definitions

dqq0404 commented 1 week ago

If I use the tau coefficient_z-score to test the significance of Tau_star.coefficient, should I also use the Bonferroni threshold to control false positive signals (such as 0.05/96)?