Closed yuhanH closed 1 year ago
Thanks for the reproducible example. This was a memory bug triggered by the combination of reordered rows and writing out a dense matrix transformation element-by-element (with the as("dgCMatrix")
call). It should be fixed in the main branch now -- please reopen and let me know if you have any further issues.
I'd also have a performance suggestion for the code it looks like you're trying to implement:
mat.scale <- (mat - feature.mean)/feature.sd
creates a dense matrix, though one that can use a specialized mat-vec product operation that takes time proportional to the number of non-zero entries.
min_scalar(mat.scale, 10)
, a potentially sparse transform (min) will be applied after the matrix is already dense, thereby eliminating the specialized mat-vec product and forcing min to operate over the full matrix rather than just the non-zero entriesmin_by_row
and min_by_col
operations for this exact scenario, so that the min operation can be applied on the sparse matrixUsage example for replicating Seurat's default normalization:
library(BPCells)
library(Seurat)
library(dplyr)
library(SeuratData)
data('pbmc3k')
pbmc3k <- UpdateSeuratObject(pbmc3k)
pbmc3k <- FindVariableFeatures(pbmc3k)%>%ScaleData()%>%RunPCA()
mat <- write_matrix_memory(pbmc3k[['RNA']]@data)
mat <- mat[VariableFeatures(pbmc3k),]
feature_stat <- matrix_stats(matrix = mat, row_stats = c('variance'))
feature.mean <- feature_stat$row_stats['mean',]
feature.sd <- sqrt(feature_stat$row_stats['variance',])
# Derive a per-row cap from the inequality (X - mean)/sd <= 10
mat.capped <- min_by_row(mat, 10*feature.sd + feature.mean)
mat.scale <- (mat.capped - feature.mean)/feature.sd
ans <- pbmc3k@assays$RNA@scale.data[VariableFeatures(pbmc3k),]
all.equal(
ans,
as.matrix(mat.scale)
)
# TRUE
loadings <- pbmc3k[['pca']]@feature.loadings[VariableFeatures(pbmc3k),]
all.equal(
t(mat.scale) %*% loadings,
t(ans) %*% loadings
)
# TRUE
Makes sense. Thank you!
Two minor issues. 'min by row' doesn't appear to be exported. To call it, I must use 'BPCells:::min by row'. When I show this object in R after running "min by row," it then prints some error messages even though it has no effect on other operations on the object.
mat.capped <- min_by_row(mat, 10*feature.sd + feature.mean)
mat.capped
Error in (function (cl, name, valueClass) :
assignment of an object of class “numeric” is not valid for @‘row_params’ in an object of class “TransformMinByRow”; is(value, "matrix") is not TRUE
There is a bug in BPCells after several operations in the
IterableMatrix
matrix. After row subset, row centering, row standardization, the matrix cannot be converted sparse matrix (crash R session) but matrix multiplication still can work. With additionalmin_scalar
step, the matrix multiplication doesn't work. Here is an reproducible example: