Open DayanSiddiquiNXD opened 3 years ago
Couldn't this be achieved by setting patience=1
and learning_rate=minimum_learning_rate
in the Trainer
?
@kaijennissen depends on which loss the patience window is applied to. i'm not too familiar with the source code so i cant tell. but if when the time since the loss improved is tested against the patience window in the learning_rate_scheduler.py, it tests val loss as well, and will stop if val loss has not improved, then yeah that will implement val-based early stopping.
can anyone confirm that it does?
and
https://github.com/awslabs/gluon-ts/blob/7dabd947c961954c5c11e37cdc373950b930761c/src/gluonts/mx/trainer/_base.py#L380 in combination with
But I would be happy if someone else could confirm.
@kaijennissen another problem with the lr approach is latching on to a local minima. the val loss epoch curve is not generally as smooth as the theory suggests so it will have local minima and the lr approach will latch on to the first one. true val based early stopping would have callback mechanism that allows the model to train well into overfit territory and then revert back to the weights that resulted in the global optima. this issue (https://github.com/awslabs/gluon-ts/issues/706) shows that callback isnt implemented so that isnt possible right now im guessing.
increasing the patience window will not really work because while it will decrease the chances of latching onto a local optima, it will a) still have the possibility of missing the global optima if a local optima occurs after it and makes the model think it has "improved", and b), without a callback mechanism, only be able to get the weights for the (optimal epoch + patience window) epoch number which is already overfit, and as we increase the patience window size to decrease the probability of capturing a local optima, we overshoot the optimal set of weights by a larger margin.
while the lr approach will work in a pinch (i'm gonna use it for my project, so thanks for highlighting it for me) i think this issue should remain open so the real callback based val loss early stopping can be implemented
@PascalIversen that is definitely very helpful. thanks alot. i'll see if i can use it to add true val based early stopping to my model, and if i can then i guess this issue can be closed
@DayanSiddiquiNXD did you get true val based early stopping with callsbacks running yet? Would be great to hear if this is working.
@DayanSiddiquiNXD or any maintainers, any chance you got early stopping implemented? It is a bit odd that there isn't already a simple flag for this, no?
@davidtiefenthaler yes i did. i followed @PascalIversen 's advice to pull from his branch. the only thing i needed to change in the source code was the importing of utils as there were two places to import from and the branch was importing from the wrong one. but this was a while ago so it may have been fixed now
@bradyneal an early stopping mechanism is there already in the Trainer
class, see here. This is still a bit too implicit, but essentially the learning rate reduction mechanism stops the training loop as soon as the learning rate goes below minimum_learning_rate
. So one can play with the trainer options there to tune how aggressive early stopping should be.
There's PR #1168 that proposes a more explicit set of callbacks, with which one can customize the stopping condition more easily. We hope to get that merged soon.
https://en.wikipedia.org/wiki/Early_stopping#Validation-based_early_stopping
early stopping on the basis of validation loss. I have been looking for this in gluonTS but have not been able to find it. I have found learning rate (patience) based early stopping (https://github.com/awslabs/gluon-ts/issues/555, and https://github.com/awslabs/gluon-ts/pull/701) which is also great but I think val loss early stopping would be great.