Improve memory efficiency of hyper-parameter sampling

florianhartig commented 2 years ago

See https://github.com/TheoreticalEcology/s-jSDM/issues/83#issuecomment-998709141

florianhartig commented 2 years ago

Are you sure the problem is grid.expand(lambda_coef, lambda_cov, lambda_spatial, alpha_cov, alpha_cov, alpha_spatial)?

Because even though it's inefficient, I don't see how that this would use so much memory that it becomes a problem? I would have thought it is because of keeping too many information for each run?

MaximilianPi commented 2 years ago

Yeah, I am sure that the grid.expand function is the cause:

(My suggestion from #83 )

                lambda_cov = seq(0, 0.1, 0.001),
                lambda_coef = seq(0, 0.1, 0.001),
                alpha_cov = seq(0, 1, 0.05),
                alpha_coef = seq(0, 1, 0.05),
                alpha_spatial = seq(0, 1, 0.05),
                lambda_spatial = 2^seq(-10, -0.5, length.out = 20)

Lengths: 101, 101, 20, 20, 20, 20

--> (10110121212120)64/8/1e9 = 15.11 GB

florianhartig commented 2 years ago

OK, I didn't realise the grid was that dense.

I wonder: even when making this efficient, does it make sense to draw randomly from such a dense grid? It creates the illusion that we have a dense grid covered, but effectively this will be full of holes for n = 100 or so.

I would say either cover the entire grid systematically (in which case we should allow only very few variations per dimension), or else maybe let the user provide min/max and use some kind of optimisation (e.g. GA) to search in the hyperparameter space?

MaximilianPi commented 2 years ago

I wonder: even when making this efficient, does it make sense to draw randomly from such a dense grid? It creates the illusion that we have a dense grid covered, but effectively this will be full of holes for n = 100 or so.

Yeah, well but that's random hyper-parameter tuning which is more efficient than naive grid-search Bergstra and Bengio, 2012 😅

I would say either cover the entire grid systematically (in which case we should allow only very few variations per dimension), or else maybe let the user provide min/max and use some kind of optimisation (e.g. GA) to search in the hyperparameter space?

Yes, good idea! We could use GA (or other methods such as surrogate models etc see Birschl et al., 2021) but I'm afraid that most ppl will not have the computational resources to utilize this option (but it should probably still better scale than simple random hyper-parameter search).

We need to improve this function, there are other shortcomings we need to work on, for example:

return already a fully trained model (which is also done in glmnet)
the option to fit directly a model via sjSDM(tune_results)
multivariate-stratified sampling in the CV

TheoreticalEcology / s-jSDM

Improve memory efficiency of hyper-parameter sampling #84