Open ericphanson opened 1 year ago
Thanks @ericphanson for flagging this. There was a request for this a while ago by @CameronBieganek, but I can't find it just now.
Sometimes this might introduce scaling issues, for large datasets, in particular ones with multi-targets (think of time-series, for example), which becomes worse if we are doing nested resampling, as in evaluating a TunedModel
. So probably including predictions in output to evaluate
should be an option. Or, like sk-learn, we could have a separate function?
Another minor issue, is which "prediction" to return, or whether to return more than one kind. For a probabilistic predictor, some metrics will require predict_mode
(or predict_mean
/predict_median
) and some just predict
. Exposing the output of predict
makes the most sense, but I think it's possible for the user to limit operations to, say, just predict_mode
, so that predict
is not actually ever called. Probably the simplest design is to force the predict
call anyway (if our return-predictions option is on) and always return that?
The function where all this is happening, which will need to add the desired predictions to it's return value is here.
I am not very familiar with the predict_*
functions; is it ever more than just post-processing predict
? Anyway, I do see operations
is passed into evaluate!
so maybe that can determine what kind of predictions you get back?
It sounds like the most straightforward approach is to add a return_predictions
keyword arg that if true, we add an extra table w/ something like row index and prediction to the output object.
However that kind of design always feels like perhaps we aren't "inverting control to the caller" and that a more compositional flow might be better overall. E.g. I could imagine evaluate
being implemented as the simple composition of training over folds, predicting over folds, and evaluating those w/ metrics, and exposing each layer with an API function.
However that kind of design always feels like perhaps we aren't "inverting control to the caller" and that a more compositional flow might be better overall. E.g. I could imagine evaluate being implemented as the simple composition of training over folds, predicting over folds, and evaluating those w/ metrics, and exposing each layer with an API function.
Yes, a compositional approach sounds better. I probably don't have the bandwidth for that kind of a refactor but if someone else was interested...
I'm curious, what is your use case for collecting the out-of-sample predictons? Are you doing some kind of model stacking perhaps? We have do have Stack
for that.
No, I just want to do my own evaluation on the predictions. In this case, I have multichannel data, and my model is trained to work on each channel independently. But in addition to the evaluation on that task, I want to also combine predictions over channels and then evaluate the aggregated results. I could probably do this by formulating a new composite model (I think?) but if I could just get the predictions directly, I can do whatever evaluation I want.
I have also come across this need other times, e.g. I want to plot prediction vs label for my whole dataset (can be important if you don't have a lot of data). CV lets you get useful predictions for all data points, even if there are really n_folds
different models supplying them.
Another case can be if you want to evaluate on different stratifications of the data. E.g. what if I wanted to know how my performance varies by channel (on models trained on all channels- I don't want to move one channel all to the test set, e.g.). If I have all the predictions it's easy to do any kind of evaluation needed.
Just wanted to add that I would also find it very helpful to be able to access the out-of-fold predictions from evaluate
for the same reasons listed by Eric.
Just a note that this is more doable now that we have a separate PerformanceEvaluation
and CompactPerformanceEvaluation
types. Target predictions could be recorded in the first case but dropped in the second. A kwarg compact
controls which is returned by evaluate!
.
Like https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_predict.html (pointed out by @josephsdavid!)
Currently, I am doing it manually, which works fine:
It would be nice if
evaluate
could give the predictions as well, since it needs to generate them anyway.