TheoreticalEcology / s-jSDM

Scalable joint species distribution modeling
https://cran.r-project.org/web/packages/sjSDM/index.html
GNU General Public License v3.0
68 stars 14 forks source link

Memory problems for importance() with large covariances #64

Closed florianhartig closed 3 years ago

florianhartig commented 3 years ago

Question from a user (redacted for conciseness and privacy):

... we have been working on analyzing an absolutely enormous XXX dataset with s-jSDM.

Good news: given enough processors and memory, s-jSDM does handle datasets working in the tens of thousands of species pretty well.

However, I have run into a subsequent memory problem when attempting to parse the importance from the model output. I’ve looked at the code for the function and I’m pretty sure it stems from the matrix multiplication expression involving the species covariance matrix (unsurprising, given its size).

So I was wondering: have either of you run any tests on resource requirements for the importance function to see how they scale with the number of species?

MaximilianPi commented 3 years ago

The problem occurs in the importance function because only there the association matrix is actually calculated and together with the rowSums function it can blow up the memory solutions:

a) move importance function to torch -> use single precision (32bit) or even half-precision (16bit) + internal parallelization b) https://cran.r-project.org/web/packages/bigmemory/index.html + bigalgebra

MaximilianPi commented 3 years ago

solved in #65