const-ae / glmGamPoi

Fit Gamma-Poisson Generalized Linear Models Reliably
105 stars 14 forks source link

Differential Expression case vs control across all cell types #16

Closed Al-Murphy closed 3 years ago

Al-Murphy commented 3 years ago

Hi,

We want to test for differential expression in all cell types in a single-cell dataset for a case vs control experiment and were hoping to use glmGamPoi to do so. What would you say best practise is for setting the value of reference_level?

Thanks!

const-ae commented 3 years ago

Hi Alan,

I am happy to help. First of all, it is important to remember that setting the reference_level does not change the fit, but it can make it easier to interpret the results. In your case, you will probably want to set your control condition as the reference_level so that the treatment will be relative to that.

The second important point is to make sure that you understand what the coefficients of your model mean, so that you can choose the appropriate contrast. I wrote a small package CoefExplainer that visualizes the coefficients for a linear model.

Thirdly, from what you describe, I assume that you have a data.frame with

If you furthermore have done the experiments with appropriate replicates, you should also form a pseudobulk with the pseudobulk_by argument in test_de(). If you don't form a pseudobulk, it is important to keep in mind that your p-values will be inflated, because cells from the same sample are not actually independent.

Best, Constantin

const-ae commented 3 years ago

Example with CoefExplainer:

set.seed(1)
library(CoefExplainer)
data <- data.frame(cell_type = c(rep("TCell", 4), rep("BCell", 4), rep("Makrophage", 4)),
                   condition = c(rep(c("treated", "treated", "control", "control"), 3)),
                   y = rnorm(12))

# This is what 'reference_level =...' does internally
data$condition <- relevel(as.factor(data$condition), "control")

res <- CoefExplainer(data, y ~ cell_type + cell_type:condition - 1)
plotModel(res)

image

Created on 2021-01-28 by the reprex package (v0.3.0)

Al-Murphy commented 3 years ago

Hey Constantin,

Thanks for getting back so quickly and for the very useful information.

Your assumptions are correct about my data and I agree with your approach. So, just to confirm, my formula would be

glm_gp(..., design = ~ cell_type + cell_type:condition - 1, reference_level = "Control")

Where "Control" is a level from the condition column.

I plan to use test_de() to derive the DEGs and will be utilising the pseudobulk_by per patient:

pseudobulk_by = paste0(condition, "-", patient_id)

One final thing is what is the best practice for the contrast parameter in test_de() to get all DEGs for each cell type?

Thanks again, Alan.

const-ae commented 3 years ago

glm_gp(..., design = ~ cell_type + cell_type:condition - 1, reference_level = "Control")

Looks good. However, if you have a paired experimental design (control and treatment in the same patient), you can include that in the formula as well: glm_gp(..., design = ~ cell_type + patient_id + cell_type:condition - 1, reference_level = "Control")

I plan to use test_de() to derive the DEGs and will be utilising the pseudobulk_by per patient: pseudobulk_by = paste0(condition, "-", patient_id)

Looks good :)

One final thing is what is the best practice for the contrast parameter in test_de() to get all DEGs for each cell type?

The contrast parameter specifies coefficient or combination of coefficients, for which you test if it is significantly different from 0. The contrast always depends on the original design formula. With the design formula discussed above, you can directly specify the coefficient. You will need to call test_de() once for each coefficient that you test:

res1 <- test_de(fit, contrast = `cell_typeBCell:conditiontreated`)
res2 <- test_de(fit, contrast = `cell_typeTCell:conditiontreated`)
res3 <- test_de(fit, contrast = `cell_typeMakrophage:conditiontreated`)
...