FertigLab / CoGAPS

Bayesian MCMC matrix factorization algorithm
https://www.bioconductor.org/packages/release/bioc/html/CoGAPS.html
BSD 3-Clause "New" or "Revised" License
66 stars 17 forks source link

Too long runtime for GoGAPS R #96

Closed LiuCanidk closed 7 months ago

LiuCanidk commented 7 months ago

I tried to run CoGAPS in a relatively large single cell dataset (35412 cells * 50000+ genes)

And the minumum running time for me (nPatterns, i.e., k=5, 6, 7, 8, 9, 10) was unacceptable, with "sparseOptimization=True, nSets=20". As shown below, only k=5 needs 2600+h, more than 100 days! And k=11 needs even more, ~4000h!

image

Did I miss something that can speed up parallelization? Or the would pyCoGAPS be much faster? (I notice in the Nature Protocol manuscript, pyCoGAPS just has a slight increase in speed performance)

Any suggestions would be greatly appreciated!

dimalvovs commented 7 months ago

Hi @LiuCanidk, the output that is shown above corresponds to standard CoGAPS, in the case of distributed run using nSets it should report the distributed params before the run, something like this:

-- Distributed CoGAPS Parameters -- 
nSets          6 
cut            5 
minNS          3 
maxNS          9 
dimalvovs commented 7 months ago

Closing as solved, please reopen if needed.