The big thing that keeps coming up that seems to throw a wrench into everything is the difference between models and model families.
In some sense, you can work with a model by fitting a model family on a hyperparameter grid containing a single point. This is the approach caret takes, and I believe the one present in the current interface proposal.
I think is minorly problematic in terms of conceptual clarity, but majorly problematic in terms of implementation of new models. If you're implementing a new modelling technique, it makes a lot more sense to first write a fit method for models (i.e. glmnet::glmnet) and then to write a fit method (i.e. hyperparameter selection method) that may make heavy use of fit.model.
I want this separation because I think it'll be key to selling an interface to people writing new methods, but there are then two problems:
How to initialize a model as opposed to a model family?
One approach is to offer both new_knn_model() and new_knn_family(), but this feels sloppy. Maybe it is a good idea though. Another option would be to offer new_knn() that creates a "knn" dummy object that gets transformed into a knn_model or a knn_family based on the particular call to fit.
How to differentiate a fit call that produces a model and a model family object?
Maybe let type safety be a guiding ideal here? Or is a type-unsafe function okay, where specifying individual hyperparameter values results in a model but passing in a hyperparameter space object results in a model family. In this case function signatures might seem weird.
In general, I think it's a good idea to discourage use of model families with singleton hyperparameter spaces.
After writing this all out, I'm starting think that a separate new_knn_model() and new_knn_family() might be a good idea, with new_knn() acting as a wrapper around new_knn_family().
The big thing that keeps coming up that seems to throw a wrench into everything is the difference between models and model families.
In some sense, you can work with a
model
by fitting amodel family
on a hyperparameter grid containing a single point. This is the approach caret takes, and I believe the one present in the current interface proposal.I think is minorly problematic in terms of conceptual clarity, but majorly problematic in terms of implementation of new models. If you're implementing a new modelling technique, it makes a lot more sense to first write a
fit
method for models (i.e.glmnet::glmnet
) and then to write afit
method (i.e. hyperparameter selection method) that may make heavy use offit.model
.I want this separation because I think it'll be key to selling an interface to people writing new methods, but there are then two problems:
One approach is to offer both
new_knn_model()
andnew_knn_family()
, but this feels sloppy. Maybe it is a good idea though. Another option would be to offernew_knn()
that creates a "knn" dummy object that gets transformed into aknn_model
or aknn_family
based on the particular call tofit
.fit
call that produces amodel
and amodel family
object?Maybe let type safety be a guiding ideal here? Or is a type-unsafe function okay, where specifying individual hyperparameter values results in a
model
but passing in a hyperparameter space object results in a model family. In this case function signatures might seem weird.In general, I think it's a good idea to discourage use of model families with singleton hyperparameter spaces.
After writing this all out, I'm starting think that a separate
new_knn_model()
andnew_knn_family()
might be a good idea, withnew_knn()
acting as a wrapper aroundnew_knn_family()
.@jarvmiller thoughts?