Closed daramirez closed 7 months ago
Further thoughts on DESeq2 implementation:
Both edgeR and DESeq2 use techniques downstream of Gordon Smyth's eBayes()
"variance smoothing" methods to tamp down gene-wise variation across the samples (https://pubmed.ncbi.nlm.nih.gov/16646809/)
The big(gest) differences between the two pipelines are:
We will need to support "low n pseudobulking" where 'low n is less than 4 to use FitRegularizedClassificationGlm()
for feature selection anyway, so this offers a natural pipeline bifurcation. If you have enough samples that gene expression being dependent on a single subject is sufficient for outlier detection (e.g. each subject is suspected to respond approximately the same as the rest of the subjects), we can opt for DESeq2, and use edgeR for situations where single-subject variance is important/useful.
Off topic, but I added a basic CellMembrane vignette into this PR as well that covers a lot of the essentials.
Unsure why the devel branch failed, but TODO:
make.names()
'd in the design matrix, but within FilterContrastColumns()
the user's logicList is compared to the column names of the design matrix. looks like the vignette builds now. This is possibly related to https://github.com/satijalab/seurat-object/issues/191, and by selecting n = 2 for the number of fake subjects it was creating a couple of single-cell seurat objects during RunFilteredContrasts()
as long as everything passes, this should be good to go. (sorry for the 1,400 line PR)
edit: what's failing is this:
count_matrix <- SeuratObject::GetAssayData(seuratObj, assay = assayName, layer = 'counts')
count_matrix <- count_matrix[geneSpace, ]
rownames(count_matrix) <- geneSpace
with attempt to set 'rownames' on an object with no dimensions
where geneSpace
is a vector of genes from parsing all of the DEG tables from RunFilteredContrasts()
. It seems like SeuratObject:GetAssayData(...)
is failing to access the count matrix on devel?
There ended up only being one gene passed into geneSpace
on R 4.4/ Bioc 3.19 due to some difference in either how the data are generated or edgeR's behavior. Unrelated to Seurat (hooray!)
Hi Daniel, thanks for the PR.
I think two successful approaches we've taken with pseudobulking downstream of https://github.com/bimberlabinternal/CellMembrane/pull/167 (where we seek to filter large swaths of experimental variable contrasts to isolate comparisons of interest using "regular DE" methods that are computationally efficient) are:
There are a few sub-goals we can split to convert these currently manual pipelines into less-manual versions that can be deployed in other areas of the lab.
Goals:
FilterPseudobulkContrasts()
to populate the matrix.