There are several bits where efficiency can be improved:
[ ] Cross-validation: We currently run cross-validation after hyperparameter optimisation. This isn't strictly necessary, as the optimisation with RandomizedSearchCV and BayesSearchCV do already contain the relevant cross-validations. However, their output is less detailed than cross_validate and doesn't contain the results for each cv, but only their averages. The question is: Is there a way to not run cross_validate again after grid search?
[ ] Parallelisation: Some models, such as LightGBM, take an n_jobs argument. Currently, these are always set to 1, so that we only parallelise using cross_validate or grid search, but not within models. Is that the best way?
@mastoffel I have tried to break down this epic into more bite-sized chunks, but there are obviously things missing (see the "..." above) so feel free to edit this. Then we can make these into issues today!
There are several bits where efficiency can be improved:
RandomizedSearchCV
andBayesSearchCV
do already contain the relevant cross-validations. However, their output is less detailed thancross_validate
and doesn't contain the results for each cv, but only their averages. The question is: Is there a way to not runcross_validate
again after grid search?LightGBM
, take ann_jobs
argument. Currently, these are always set to 1, so that we only parallelise usingcross_validate
or grid search, but not within models. Is that the best way?autoemulate
run well on a cluster?