Closed argschwind closed 4 years ago
What do you get if you center and scale your nGenes covariate?
Thanks for the quick response. Scaling the nGenes covariate does the trick, and the log fold changes now look as expected.
# center and scale nGenes
cd <- colData(sca)
cd$nGenes <- as.numeric(scale(cd$nGenes))
colData(sca) <- cd
# histogram of centered and scaled gene counts
hist(cd$nGenes, main = "nGenes centered and scaled", xlab = "nGenes")
# fit model
zlm_fit <- zlm(~pert+nGenes, sca)
# perform likelihood ratio test for the perturbation coefficient
summary_zlm_fit <- summary(zlm_fit, doLRT = "pert1")
summary_dt <- summary_zlm_fit$datatable
# extract p-values and logFC for each gene
pvals <- summary_dt[contrast == "pert1" & component == "H", .(primerid, `Pr(>Chisq)`)]
lfc <- summary_dt[contrast == "pert1" & component == "logFC", .(primerid, coef, ci.hi, ci.lo)]
# assemble output
output <- as.data.frame(merge(lfc, pvals, by = "primerid"))
colnames(output) <- c("gene", "logFC", "ci_high", "ci_low", "pvalue")
output <- output[order(output$pvalue), ]
# volcano plot for perturbation effect
qplot(x = logFC, y = -log10(pvalue), data = output, main = "perturbation") +
theme_bw()
Best, Andreas
Hi,
I'm analyzing some single-cell RNA-seq data from a pooled CRISPR-screen. Using the known target genes of well understood CRISPR interference perturbations we compared different DE methods, and came to the conclusion that MAST with a nGenes (ngeneson in your tutorial) covariate performs best based on a AUPRC analysis.
However, when extracting the effect sizes for the individual coefficients, mainly the CRISPR perturbation effect, I discovered that the most significant genes tend to have very low log fold changes in expression. This is only the case when fitting a model with the nGenes covariate. When fitting a model with only the perturbation effect, significant genes also show a larger logFC as I would expect. Furthermore, I noticed that in many cases nGenes is highly significant, although only showing relatively small logFC changes.
I attached an example, which can be run by downloading an sca object from Google drive.
Fitting the model with nGenes covariate and create volcano plot with p-value vs. logFC for perturbation:
Creating the same plot for the nGenes covariate:
When fitting a model with perturbation as the only variable, the volcano plot looks fine:
Note: In this example none of the genes would pass an FDR threshold of 0.05, but the same effects are also seen for highly significant cases.
My R session: