FelixTheStudent / cellpypes

Cell type pipes for R
GNU General Public License v3.0
51 stars 3 forks source link

precompute function #10

Closed FelixTheStudent closed 3 years ago

FelixTheStudent commented 3 years ago

Problem: For large data set (MS data from Schirmer group, for example), the interactive commands (rule + plot_last) are permissively slow.

Solution: Precomputing the totalUMI S would speed things up for the MS data set, since raw is in gene-wise HDF5Array format (so cell-wise computations take long).

I found it unnecessary to precompute K when using an NxN neighbor graph (SNN > .1), since that uses matrix multiplication which is incredibly fast. I still have to test an Nxk neighbor graph (i.e. matrix with indices of kNN for each cell), that might be slower.

Both cases could be solved with something like this:

obj <- list(raw, neighbors, embedding)
obj <- precompute(obj)  # computes S
obj <- precompute(obj, genes=c("PLP1", "AQP1") # precomputes genes
FelixTheStudent commented 3 years ago

I will close this issue because a separate precompute function at this point is an overkill, for the following reason:

Here is some quick code (in part taken from evaluate_rule) which I used to show that pooling with Nxk neighbors is instant (on my laptop):

library(dataMS)
x <- ms_raw[, "PLP1"]

NxN_neighbors <- as(ms_snn, "dgCMatrix") > .1
as.numeric( as(NxN_neighbors, "dgCMatrix") %*% x ) 

Nxk_neighbors <- ms_nn$idx
rowSums(matrix(data=x[c(Nxk_neighbors)], ncol=ncol(Nxk_neighbors)))
FelixTheStudent commented 3 years ago

I close this issue for the above reasons. Cheers!