The big one: model and model families

The big thing that keeps coming up that seems to throw a wrench into everything is the difference between models and model families.

In some sense, you can work with a model by fitting a model family on a hyperparameter grid containing a single point. This is the approach caret takes, and I believe the one present in the current interface proposal.

I think is minorly problematic in terms of conceptual clarity, but majorly problematic in terms of implementation of new models. If you're implementing a new modelling technique, it makes a lot more sense to first write a fit method for models (i.e. glmnet::glmnet) and then to write a fit method (i.e. hyperparameter selection method) that may make heavy use of fit.model.

I want this separation because I think it'll be key to selling an interface to people writing new methods, but there are then two problems:

How to initialize a model as opposed to a model family?

One approach is to offer both new_knn_model() and new_knn_family(), but this feels sloppy. Maybe it is a good idea though. Another option would be to offer new_knn() that creates a "knn" dummy object that gets transformed into a knn_model or a knn_family based on the particular call to fit.

How to differentiate a fit call that produces a model and a model family object?

Maybe let type safety be a guiding ideal here? Or is a type-unsafe function okay, where specifying individual hyperparameter values results in a model but passing in a hyperparameter space object results in a model family. In this case function signatures might seem weird.

In general, I think it's a good idea to discourage use of model families with singleton hyperparameter spaces.

After writing this all out, I'm starting think that a separate new_knn_model() and new_knn_family() might be a good idea, with new_knn() acting as a wrapper around new_knn_family().

@jarvmiller thoughts?

alexpghayes / modelling-in-r

The big one: model and model families #12