Closed florianhartig closed 2 years ago
Are you sure the problem is grid.expand(lambda_coef, lambda_cov, lambda_spatial, alpha_cov, alpha_cov, alpha_spatial)?
Because even though it's inefficient, I don't see how that this would use so much memory that it becomes a problem? I would have thought it is because of keeping too many information for each run?
Yeah, I am sure that the grid.expand function is the cause:
(My suggestion from #83 )
lambda_cov = seq(0, 0.1, 0.001),
lambda_coef = seq(0, 0.1, 0.001),
alpha_cov = seq(0, 1, 0.05),
alpha_coef = seq(0, 1, 0.05),
alpha_spatial = seq(0, 1, 0.05),
lambda_spatial = 2^seq(-10, -0.5, length.out = 20)
Lengths: 101, 101, 20, 20, 20, 20
--> (10110121212120)64/8/1e9 = 15.11 GB
OK, I didn't realise the grid was that dense.
I wonder: even when making this efficient, does it make sense to draw randomly from such a dense grid? It creates the illusion that we have a dense grid covered, but effectively this will be full of holes for n = 100 or so.
I would say either cover the entire grid systematically (in which case we should allow only very few variations per dimension), or else maybe let the user provide min/max and use some kind of optimisation (e.g. GA) to search in the hyperparameter space?
I wonder: even when making this efficient, does it make sense to draw randomly from such a dense grid? It creates the illusion that we have a dense grid covered, but effectively this will be full of holes for n = 100 or so.
Yeah, well but that's random hyper-parameter tuning which is more efficient than naive grid-search Bergstra and Bengio, 2012 😅
I would say either cover the entire grid systematically (in which case we should allow only very few variations per dimension), or else maybe let the user provide min/max and use some kind of optimisation (e.g. GA) to search in the hyperparameter space?
Yes, good idea! We could use GA (or other methods such as surrogate models etc see Birschl et al., 2021) but I'm afraid that most ppl will not have the computational resources to utilize this option (but it should probably still better scale than simple random hyper-parameter search).
We need to improve this function, there are other shortcomings we need to work on, for example:
See https://github.com/TheoreticalEcology/s-jSDM/issues/83#issuecomment-998709141