Open nfultz opened 2 years ago
This is an interesting idea. Given the way the loss has been constructed, it will take some time to find a good threshold for early stopping that works for most of the use cases. I think on most datasets, 350 iterations will be sufficient, so it could be used as a temporary workaround for your scenario.
My current data set had a trace is below. I believe that the fit would have been essentially identical if the training had ended a hundred iterations early.
So it could be very practical to also specify a stopping condition in terms of a minimum improvement (instead of just a fixed number of iterations), especially for use cases where the training function is called repeatedly for hyperparam tuning.