MarioniLab / scran

Clone of the Bioconductor repository for the scran package.
https://bioconductor.org/packages/devel/bioc/html/scran.html
39 stars 23 forks source link

DEG results only contain downregulated gene #110

Closed littleju777 closed 1 year ago

littleju777 commented 1 year ago

Hi all,

I would like to compute, for each cell type("Cluster" column), which genes are differentially expressed between gZscan20 and gSramble("perturbed_gene" column) pseudosamples.

Here is my code:

se <- aggregateAcrossCells(tp03.sce, ids = colData(tp03.sce)[, c("Cluster","orig.ident")])
se <- se[, se$ncells >= 10]
se$perturbed_gene <- factor(se$perturbed_gene, levels = c("gZscan20", "gScramble"))
de.results <- pseudoBulkDGE(se, label = se$Cluster, condition = se$perturbed_gene, design = ~ 0 + perturbed_gene, coef="perturbed_genegScramble")
is.de <- decideTestsPerLabel(de.results, threshold=0.05)
summarizeTestsPerLabel(is.de)

it comes to only have down regugulated genes -1 0 NA Cellcycle 2682 0 29603 Eff 2547 0 29738 Exh 2313 0 29972 Int 3918 135 28232 Pro 4698 0 27587

Then I change the coef to another group:

se <- aggregateAcrossCells(tp03.sce, ids = colData(tp03.sce)[, c("Cluster","orig.ident")])
se <- se[, se$ncells >= 10]
se$perturbed_gene <- factor(se$perturbed_gene, levels = c("gZscan20", "gScramble"))
de.results <- pseudoBulkDGE(se, label = se$Cluster, condition = se$perturbed_gene, design = ~ 0 + perturbed_gene, coef="perturbed_genegZscan20")
is.de <- decideTestsPerLabel(de.results, threshold=0.05)
summarizeTestsPerLabel(is.de)

the results remain the same: -1 0 NA Cellcycle 2682 0 29603 Eff 2547 0 29738 Exh 2313 0 29972 Int 3918 135 28232 Pro 4698 0 27587

Then I change it to :

se <- aggregateAcrossCells(tp03.sce, ids = colData(tp03.sce)[, c("Cluster","orig.ident")])
se <- se[, se$ncells >= 10]
se$perturbed_gene <- factor(se$perturbed_gene, levels = c("gZscan20", "gScramble"))
de.results <- pseudoBulkDGE(se, label = se$Cluster, condition = se$perturbed_gene, design = ~ 0 + perturbed_gene, contrast = c(1, -1))
is.de <- decideTestsPerLabel(de.results, threshold=0.05)
summarizeTestsPerLabel(is.de)

the results change to this: -1 0 1 NA Cellcycle 0 2681 1 29603 Eff 1 2546 0 29738 Exh 0 2313 0 29972 Int 3 4045 5 28232 Pro 1 4697 0 27587

I'm a bit confused about it. Basically, I have 3 questions:

  1. is my code right? why there are always negative genes whatever my coef is? If my code is wrong, could you help me fix it?
  2. Does pseudoBulkDGE set up a threshold on logFC too?
  3. Do you think there is something wrong with my datasets? How could I generate DEGs that make more sense? By cleaning my samples?

Thank you so much!!!

Ming

LTLA commented 1 year ago

Wrong coefficient. With your design matrix, the perturbed_genegScramble coefficient represents the average log-expression in the scramble group. So, if you set it as coef=, you're testing the null hypothesis that the average log-expression for scramble is... equal to zero, which is not a very interesting hypothesis.

The null hypothesis that you want is, "the scramble and knockdown groups are the same". You can achieve this by providing a contrast= IIRC, or you can just use a design matrix like ~ perturbed_gene where the non-intercept coefficient directly represents the difference in log-expression (i.e., the log-fold change) between groups.

littleju777 commented 1 year ago

Thank you so much for the help!! I tried your two suggestions, it works perfectly. Thank you!

Best, Ming