bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
643 stars 343 forks source link

Is enrichment same when using LD scores with different combination of categories? #452

Open Jesson-mark opened 2 months ago

Jesson-mark commented 2 months ago

Hi, thanks for the great software!

I encountered a new question about the enrichment results when using different combinations of LD scores. Since the LD score of each category is independent(see issue before), they can be 'combined' or 'pasted' together. But if I run enrichment of partitioned heritability twice with overlapping categories, were enrichment statistics of those overlapping categories same? For example, the first time I used LD scores of two categories (i.e., base and Coding_UCSC.bed) to run ldsc.py and the second time I used LD scores of three categories (two categories before plus Human_Promoter_Villar_ExAC), were the enrichment statistics of Coding_UCSC.bed same in the two runs of ldsc.py? If not, why? Do you suggest running ldsc.py using as many as possible categories? Specifically, if I'm only interested in only one category, was it better to run ldsc.py using all categories in baselineLD model plus the category of interest than running ldsc.py using only the base category and the category of interest?

Best regards, Jie Wang

aksarkar commented 2 months ago

@Jesson-mark No, they will not be the same because ldsc is a multiple regression model, and each coefficient gives the increase in expected chi^2 statistic when increasing the LD score of that annotation, holding the LD score of each other annotation fixed. Clearly, this depends on what other annotations went into the model.

You should always run your annotation of interest with the entire baseline model, not just the base category.

Jesson-mark commented 2 months ago

Thanks for your detailed explanation! I'll keep all annotations in the full baseline model!

I have another question relating to the relative enrichment of two categories, where one category is a subset of the other category. For example, I have two sets of SNPs: set1 have 10,000 SNPs and set2 have 1,000 SNPs (a subset of set1). I want to evaluate whether the SNPs within set2 are enriched for heritability relative to set1. According to S-LDSC, since set2 explained 10% of SNPs within set1, what is the proportition of heritability explained by SNPs within set2 relative to those within set1? Is there a enrichment? I wonder if S-LDSC could do this enrichment.

Best regards, Jie Wang

Jesson-mark commented 2 months ago

Hi, I encountered another question about the choice of SNPs.

I have a set of 1,000 SNPs of interest and I found that only 200 SNPs are within the hm3 snps list. If I want to evaluate the enrichment of the 1,000 SNPs, do I need to calculate the LD score for all genome-wide SNPs(not only hm3 snps)? Or if I used only hm3 snps to perform enrichment (where only 200 SNPs of interest are included), does the result can represent all the 1,000 SNPs?

I'm very confused about this question. Looking forward to your reply!

Thanks! Jie Wang