bimberlabinternal / CellMembrane

An R package with wrappers and pipelines for single cell RNA-seq analysis
10 stars 3 forks source link

Pseudobulk analysis #228

Closed daramirez closed 7 months ago

GWMcElfresh commented 7 months ago

Hi Daniel, thanks for the PR.

I think two successful approaches we've taken with pseudobulking downstream of https://github.com/bimberlabinternal/CellMembrane/pull/167 (where we seek to filter large swaths of experimental variable contrasts to isolate comparisons of interest using "regular DE" methods that are computationally efficient) are:

  1. "The Bar Plot" to show gross sample-level differences via number of differentially expressed genes.
  2. "The Differentially Expressed Heatmap" which is a heatmap of the log fold changes of several different groups in a variable of interest (e.g. vaccine) against a control group as prospective functional modules.

There are a few sub-goals we can split to convert these currently manual pipelines into less-manual versions that can be deployed in other areas of the lab.

Goals:

  1. I'll broadly tidy up the Pseudobulking.R script, now that we have a better idea of the key functions.
  2. We'll implement a DESeq2 as an "engine" that we can use for pseudobulking. Grossly speaking, DESeq2 offers some additional filtering to set up edgeR as an "outlier inclusive" versus DESeq2's "outlier exclusive" approach. I am unsure if DESeq2 offers quasipoisson model fitting and likelihood ratio testing. a. We should also support more generic formulae here. For "vaccine over time" considerations, an interaction term is probably a good idea.
  3. We'll create a heatmap function that accepts a metadata field as "the variable of interest" and compares controls (either a single value or a vector of groups) and parses the results of FilterPseudobulkContrasts() to populate the matrix.
  4. I'll formalize the functions to produce the top half of the bar plot (below). I'm not overly optimistic about automating the bottom half of it, but maybe we'll get a better idea as we get some of these functions settled.
  5. Testing for DESeq2 DE, testing for heatmap creation, testing for bar plot creation.
Screenshot 2024-03-14 at 12 47 43 PM
GWMcElfresh commented 7 months ago

Further thoughts on DESeq2 implementation:

We will need to support "low n pseudobulking" where 'low n is less than 4 to use FitRegularizedClassificationGlm() for feature selection anyway, so this offers a natural pipeline bifurcation. If you have enough samples that gene expression being dependent on a single subject is sufficient for outlier detection (e.g. each subject is suspected to respond approximately the same as the rest of the subjects), we can opt for DESeq2, and use edgeR for situations where single-subject variance is important/useful.

GWMcElfresh commented 7 months ago

Off topic, but I added a basic CellMembrane vignette into this PR as well that covers a lot of the essentials.

GWMcElfresh commented 7 months ago

Unsure why the devel branch failed, but TODO:

GWMcElfresh commented 7 months ago

looks like the vignette builds now. This is possibly related to https://github.com/satijalab/seurat-object/issues/191, and by selecting n = 2 for the number of fake subjects it was creating a couple of single-cell seurat objects during RunFilteredContrasts()

GWMcElfresh commented 7 months ago

as long as everything passes, this should be good to go. (sorry for the 1,400 line PR)

edit: what's failing is this:

count_matrix <- SeuratObject::GetAssayData(seuratObj, assay = assayName, layer = 'counts')
count_matrix <- count_matrix[geneSpace, ]
rownames(count_matrix) <- geneSpace

with attempt to set 'rownames' on an object with no dimensions

where geneSpace is a vector of genes from parsing all of the DEG tables from RunFilteredContrasts(). It seems like SeuratObject:GetAssayData(...) is failing to access the count matrix on devel?

GWMcElfresh commented 7 months ago

There ended up only being one gene passed into geneSpace on R 4.4/ Bioc 3.19 due to some difference in either how the data are generated or edgeR's behavior. Unrelated to Seurat (hooray!)