gjhunt / hspe

hybrid-scale proportions estimation
GNU General Public License v3.0
3 stars 3 forks source link

Support for sparse matrices? #1

Open jaclynbeck-sage opened 1 year ago

jaclynbeck-sage commented 1 year ago

Hello! I am using this package a lot, but am having some difficulty as I add in single cell data sets as references. I have multiple large data sets that are typically kept as sparse matrices, but both HSPE and dtangle force them into dense matrices. The dense version of this data takes up > 30 GB of memory. The inner functions of HSPE and dtangle then make a few copies of these matrices (i.e the first lines of find_markers), which makes these packages unusable on these data sets even on machines with tons of memory.

It would be awesome if both HSPE and dtangle imported the "Matrix" package and allowed things to be sparse!

For HSPE, I was able to run the entire hspe() function by passing in sparse matrix Y (which was references + test combined), pure_samples, no reference, method = "ratio", and changing the following code in find_markers:

avg_exp_fn <- function(x) {
      colMeans(2^(Y[x, , drop = FALSE]))/gamma
    }

to

avg_exp_fn <- function(x) {
      Matrix::colMeans(2^(Y[x, , drop = FALSE]))/gamma
    }

I havent tested dtangle or other sets of parameters, but it looks like just importing the Matrix package might be enough to allow sparse support. Thank you for writing these packages!

gjhunt commented 1 year ago

Happy to consider including this. Do you have a minimum working example, or would you be able to write up a little simulation of some data to test this? If so, I can write some test cases and look at how easy it is to implement.