Closed Al-Murphy closed 3 years ago
Hi Alan,
I am happy to help. First of all, it is important to remember that setting the reference_level
does not change the fit, but it can make it easier to interpret the results. In your case, you will probably want to set your control condition as the reference_level
so that the treatment will be relative to that.
The second important point is to make sure that you understand what the coefficients of your model mean, so that you can choose the appropriate contrast. I wrote a small package CoefExplainer that visualizes the coefficients for a linear model.
Thirdly, from what you describe, I assume that you have a data.frame with
glm_gp(..., design = ~ cell_type + cell_type:condition - 1)
, because now the interaction coefficients (e.g., "cell_typeTCell:conditionTreated"
) directly correspond to the difference between mean expression level in treated and control cell for each cell type.If you furthermore have done the experiments with appropriate replicates, you should also form a pseudobulk with the pseudobulk_by
argument in test_de()
. If you don't form a pseudobulk, it is important to keep in mind that your p-values will be inflated, because cells from the same sample are not actually independent.
Best, Constantin
Example with CoefExplainer
:
set.seed(1)
library(CoefExplainer)
data <- data.frame(cell_type = c(rep("TCell", 4), rep("BCell", 4), rep("Makrophage", 4)),
condition = c(rep(c("treated", "treated", "control", "control"), 3)),
y = rnorm(12))
# This is what 'reference_level =...' does internally
data$condition <- relevel(as.factor(data$condition), "control")
res <- CoefExplainer(data, y ~ cell_type + cell_type:condition - 1)
plotModel(res)
Created on 2021-01-28 by the reprex package (v0.3.0)
Hey Constantin,
Thanks for getting back so quickly and for the very useful information.
Your assumptions are correct about my data and I agree with your approach. So, just to confirm, my formula would be
glm_gp(..., design = ~ cell_type + cell_type:condition - 1, reference_level = "Control")
Where "Control" is a level from the condition column.
I plan to use test_de()
to derive the DEGs and will be utilising the pseudobulk_by
per patient:
pseudobulk_by = paste0(condition, "-", patient_id)
One final thing is what is the best practice for the contrast
parameter in test_de()
to get all DEGs for each cell type?
Thanks again, Alan.
glm_gp(..., design = ~ cell_type + cell_type:condition - 1, reference_level = "Control")
Looks good. However, if you have a paired experimental design (control and treatment in the same patient), you can include that in the formula as well: glm_gp(..., design = ~ cell_type + patient_id + cell_type:condition - 1, reference_level = "Control")
I plan to use test_de() to derive the DEGs and will be utilising the pseudobulk_by per patient: pseudobulk_by = paste0(condition, "-", patient_id)
Looks good :)
One final thing is what is the best practice for the contrast parameter in test_de() to get all DEGs for each cell type?
The contrast parameter specifies coefficient or combination of coefficients, for which you test if it is significantly different from 0. The contrast always depends on the original design formula. With the design formula discussed above, you can directly specify the coefficient. You will need to call test_de()
once for each coefficient that you test:
res1 <- test_de(fit, contrast = `cell_typeBCell:conditiontreated`)
res2 <- test_de(fit, contrast = `cell_typeTCell:conditiontreated`)
res3 <- test_de(fit, contrast = `cell_typeMakrophage:conditiontreated`)
...
Hi,
We want to test for differential expression in all cell types in a single-cell dataset for a case vs control experiment and were hoping to use glmGamPoi to do so. What would you say best practise is for setting the value of reference_level?
Thanks!