FertigLab / CoGAPS

Bayesian MCMC matrix factorization algorithm
https://www.bioconductor.org/packages/release/bioc/html/CoGAPS.html
BSD 3-Clause "New" or "Revised" License
61 stars 17 forks source link

enable sparse matrix support when input is object #82

Closed dimalvovs closed 7 months ago

dimalvovs commented 7 months ago

https://github.com/FertigLab/CoGAPS/blob/241867be5e8a96444637c7e3436c78c85430783a/R/HelperFunctions.R#L342

to reproduce:

> library("CoGAPS")
> 
> #pass mtx directly fails
> mtx <- Matrix::readMM("inst/extdata/GIST.mtx")
class(mtx)
> class(mtx)
[1] "dgTMatrix"
attr(,"package")
[1] "Matrix"
> res <- CoGAPS(mtx)
Error in convertDataToMatrix(data) : unsupported data type
> 
> #pass mtx as filepath works
> res2 <- CoGAPS("inst/extdata/GIST.mtx", messages = FALSE, nIterations=100)

This is CoGAPS version 3.22.0 
Running Standard CoGAPS on inst/extdata/GIST.mtx (1363 genes and 9 samples)
dimalvovs commented 7 months ago

A workaround to this issue is converting the object with as.matrix() and then passing to CoGAPS with sparseOptimization=TRUE. C++ code would still convert the matrix to sparse format and the final results are the same. The downside seems to be the increased RAM consumption as the resulting matrix is bigger. Proof of result equality below (interestingly sparse->dense->sparse takes less time to compute than sparse from file):

> library("CoGAPS")
> 
> path <- 'normalized.mtx'
> 
> message('dense with conversion')
dense with conversion
> tictoc::tic()
> mat <- Matrix::readMM(path)
mat <- as.matrix(mat)
res1 <- CoGAPS(mat, nIterations=100, messages=FALSE, sparseOptimization=TRUE, nPatterns=2, seed=5)
tictoc::toc()
> mat <- as.matrix(mat)
Warning message:
In asMethod(object) :
  sparse->dense coercion: allocating vector of size 1.3 GiB
> res1 <- CoGAPS(mat, nIterations=100, messages=FALSE, sparseOptimization=TRUE, nPatterns=2, seed=5)

This is CoGAPS version 3.22.0 
Running Standard CoGAPS on mat (36601 genes and 4898 samples)
> tictoc::toc()
46.003 sec elapsed

> message('sparse from file')
sparse from file
> tictoc::tic()
> res2 <- CoGAPS(path, nIterations=100, messages=FALSE, sparseOptimization=TRUE, nPatterns=2, seed=5)
tictoc::toc()

This is CoGAPS version 3.22.0 
Running Standard CoGAPS on normalized.mtx (36601 genes and 4898 samples)
> tictoc::toc()
52.744 sec elapsed

> all.equal(res1, res2)
[1] "Attributes: < Component “metadata”: Component “totalRunningTime”: Mean relative difference: 0.03333333 >"