Closed FelixTheStudent closed 3 years ago
I will close this issue because a separate precompute function at this point is an overkill, for the following reason:
totalUMI
or totals
in the object, that evaluate_rule then searches for totals. That's more intuitive to the user than a precompute
function, and in the vignettes I can simply write precomputing totals can speed up the interactive pypes (especially if your raw UMI data is in gene-wise HDF5Array, which I recommend for large data sets).Here is some quick code (in part taken from evaluate_rule
) which I used to show that pooling with Nxk neighbors is instant (on my laptop):
library(dataMS)
x <- ms_raw[, "PLP1"]
NxN_neighbors <- as(ms_snn, "dgCMatrix") > .1
as.numeric( as(NxN_neighbors, "dgCMatrix") %*% x )
Nxk_neighbors <- ms_nn$idx
rowSums(matrix(data=x[c(Nxk_neighbors)], ncol=ncol(Nxk_neighbors)))
I close this issue for the above reasons. Cheers!
Problem: For large data set (MS data from Schirmer group, for example), the interactive commands (rule + plot_last) are permissively slow.
Solution: Precomputing the totalUMI S would speed things up for the MS data set, since
raw
is in gene-wiseHDF5Array
format (so cell-wise computations take long).I found it unnecessary to precompute K when using an NxN neighbor graph (SNN > .1), since that uses matrix multiplication which is incredibly fast. I still have to test an Nxk neighbor graph (i.e. matrix with indices of kNN for each cell), that might be slower.
Both cases could be solved with something like this: