precompute function - Githubissues

FelixTheStudent commented 3 years ago

Problem: For large data set (MS data from Schirmer group, for example), the interactive commands (rule + plot_last) are permissively slow.

Solution: Precomputing the totalUMI S would speed things up for the MS data set, since raw is in gene-wise HDF5Array format (so cell-wise computations take long).

I found it unnecessary to precompute K when using an NxN neighbor graph (SNN > .1), since that uses matrix multiplication which is incredibly fast. I still have to test an Nxk neighbor graph (i.e. matrix with indices of kNN for each cell), that might be slower.

Both cases could be solved with something like this:

obj <- list(raw, neighbors, embedding)
obj <- precompute(obj)  # computes S
obj <- precompute(obj, genes=c("PLP1", "AQP1") # precomputes genes

FelixTheStudent commented 3 years ago

I will close this issue because a separate precompute function at this point is an overkill, for the following reason:

computing totalUMI is the computational bottle-neck, not NxN or Nxk pooling.
I will create a slot called totalUMI or totals in the object, that evaluate_rule then searches for totals. That's more intuitive to the user than a precompute function, and in the vignettes I can simply write precomputing totals can speed up the interactive pypes (especially if your raw UMI data is in gene-wise HDF5Array, which I recommend for large data sets).

Here is some quick code (in part taken from evaluate_rule) which I used to show that pooling with Nxk neighbors is instant (on my laptop):

library(dataMS)
x <- ms_raw[, "PLP1"]

NxN_neighbors <- as(ms_snn, "dgCMatrix") > .1
as.numeric( as(NxN_neighbors, "dgCMatrix") %*% x ) 

Nxk_neighbors <- ms_nn$idx
rowSums(matrix(data=x[c(Nxk_neighbors)], ncol=ncol(Nxk_neighbors)))

FelixTheStudent commented 3 years ago

I close this issue for the above reasons. Cheers!

FelixTheStudent / cellpypes

precompute function #10