Open YingkaiSun opened 1 month ago
Thank you for your response! I would like to seek further clarification on the interpretation of regression coefficients when control annotations are included in the S-LDSC model.
Here is a snippet from the .ldcts file in Multi_tissue_gene_expr.ldcts: | V1 | V2 |
---|---|---|
Adipose_Subcutaneous | Multi_tissue_gene_expr_1000Gv3_ldscores/GTEx.1.,Multi_tissue_gene_expr_1000Gv3_ldscores/GTEx.control. | |
AdiposeVisceral(Omentum) | Multi_tissue_gene_expr_1000Gv3_ldscores/GTEx.2.,Multi_tissue_gene_expr_1000Gv3_ldscores/GTEx.control. | |
Adrenal_Gland | Multi_tissue_gene_expr_1000Gv3_ldscores/GTEx.3.,Multi_tissue_gene_expr_1000Gv3_ldscores/GTEx.control. | |
Artery_Aorta | Multi_tissue_gene_expr_1000Gv3_ldscores/GTEx.4.,Multi_tissue_gene_expr_1000Gv3_ldscores/GTEx.control. | |
Artery_Coronary | Multi_tissue_gene_expr_1000Gv3_ldscores/GTEx.5.,Multi_tissue_gene_expr_1000Gv3_ldscores/GTEx.control. | |
... | ... |
I’m curious about how to interpret the regression coefficients for both the target and control annotations when the control annotation is used as a covariate in the regression model, especially when this control annotation is shared across multiple tissues.
For instance, in the Multi_tissue_gene_expr.ldcts file, multiple tissue annotations share the same control: Multi_tissue_gene_expr_1000Gv3_ldscores/GTEx.control. Does this mean that the regression coefficient for the target annotation (where annotation = 1 vs. 0) is independent of whether the control annotation is equal to 1 or 0? If so, what exactly does the control annotation represent in this context, and how should we interpret its coefficient when included in the model?
Thank you very much for your time and assistance.
Best regards, Sun Yingkai
Recall that ldsc fits a multiple linear regression of chi^2 statistics onto the LD scores partitioned by each annotation.
This means that the regression coefficient for the target annotation is not independent of the control annotations.
In the context of looking for cell-type specific annotations, the control annotations (baseline model) are meant to represent non-specific sources of heritability enrichment.
The reason to include them is to provide evidence that an enrichment for a cell-type specific annotation is indeed driven by something cell-type specific, and not something non-specific (for example, promoter-associated histone modifications).
This does seem to require using the "standard" baseline model referred to in Finucane et al. 2015, etc. in addition to any additional controls.
Thank you so much for your detailed response to my previous inquiry! but I realize that my previous question might not have been as clear as it could have been, here is a detailed description.
In the context of cell type specific analyses, as demonstrated in the wiki’s demo code, the --ref-ld-chr-cts flag is used to specify a .ldcts file that includes both target and control annotations for each cell or tissue type. Below is an example of the demo code provided:
ldsc.py \
--h2-cts UKBB_BMI.sumstats.gz \
--ref-ld-chr 1000G_EUR_Phase3_baseline/baseline. \
--out BMI_${cts_name} \
--ref-ld-chr-cts $cts_name.ldcts \
--w-ld-chr weights_hm3_no_hla/weights.
In this code, the --ref-ld-chr flag specifies the use of the baseline model (1000G_EUR_Phase3_baseline/baseline.), which, as you mentioned, captures broad non-specific sources of heritability enrichment. However, the --ref-ld-chr-cts flag simultaneously specifies a .ldcts file, which includes both target and control annotations. Here is an example of what such a .ldcts file might look like:
V1 | V2 |
---|---|
Adipose_Subcutaneous | GTEx.1.,GTEx.control. |
AdiposeVisceral(Omentum) | GTEx.2.,GTEx.control. |
Adrenal_Gland | GTEx.3.,GTEx.control. |
Artery_Aorta | GTEx.4.,GTEx.control. |
Artery_Coronary | GTEx.5.,GTEx.control. |
... | ... |
When I read the corresponding files in R. Here is an example of what I found: For the target annotation:
> fread('/syk12961/reference/ldsc/LDSCORE-SEG/Multi_tissue_gene_expr_1000Gv3_ldscores/GTEx.1.1.annot.gz')
ANNOT
<int>
1: 0
2: 0
3: 0
4: 0
5: 0
...
779350: 0
779351: 0
779352: 0
779353: 0
779354: 0
ANNOT n
<int> <int>
1: 0 604825
2: 1 174529
For the control annotation:
> fread('/syk12961/reference/ldsc/LDSCORE-SEG/Multi_tissue_gene_expr_1000Gv3_ldscores/GTEx.control.1.annot.gz')
All_Genes
<int>
1: 1
2: 1
3: 1
4: 1
5: 1
...
779350: 1
779351: 1
779352: 1
779353: 1
779354: 1
All_Genes n
<int> <int>
1: 0 82864
2: 1 696490
Given this setup, my question is:
I hope this explanation clarifies my questions. I would be very grateful for any further insights you could provide. Thank you once again for your time and response!
As stated in the wiki
Each line has two sets of LD scores to include: one is the set of LD scores corresponding to the specifically expressed genes in the cell type, while the second one is a "control" gene set of all genes. The result that will be reported will be the regression coefficient for the first set of LD scores in the list.
So, the answers to the questions are still as I gave above.
The interpretation of the coefficient for the control annotation is the same as for any other annotation.
Thanks! After I review the wiki carefully, I guess I might figure it out. The control annotate all genes-related SNPs, which could be considered as an extra adjustment on the basis of baseline model for the comparability between different cells or tissues. It functions in the same way as baseline model. Is that right?
Hi! I am currently using the --ref-ld-chr-cts option to perform S-LDSC analysis, and I have a few questions:
Thank you very much for your time and assistance. I appreciate the incredible tool you’ve developed, and any guidance on these questions would be greatly appreciated.
Best regards, Sun Yingkai