GeoscienceAustralia / uncover-ml

Machine Learning system for Geoscience Australia uncover project
Apache License 2.0
30 stars 20 forks source link

Restructure all models to be sklearn friendly #101

Open brenmous opened 4 years ago

brenmous commented 4 years ago

Scikit-learn has some strict rules about how models are structured:

Even if we inherit from a scikit-learn model, if these rules aren't followed than we aren't able to take advantage of scikit-learn utilities such as GridSearchCV optimisation, super-learner ensembles and some of the introspection/reflection functionality of models. You'll get an error along the lines of models must explcitily declare their parameters in init (no var args).

Sudipta got started on restructuring models to be compatible with GridSearchCV. These can be found in uncoverml.optimise.models. It requires tweaking to the mixins and for all parameters to be defined in the init. We should do the same for all models in uncoverml.models and then unify them all in uncoverml.models so we have a single models module. By following the work Sudipta has done it should be pretty straightforward (albeit time consuming) to complete this for all models.

The advantage is we no longer have the confusion of some models being duplicated and we can use all models with optimisation, superlearner ensembles etc.