FluxML / FluxTraining.jl

A flexible neural net training library inspired by fast.ai
https://fluxml.ai/FluxTraining.jl
MIT License
117 stars 25 forks source link

fix Hard error on EarlyStopping() #160

Closed cirobr closed 2 months ago

cirobr commented 7 months ago

159

When executing fit!, this PR addresses a hard error when an EarlyStopping() criteria is met, which causes code termination followed by error crash. As such, fit! is rewritten to address the issue.

darsnack commented 7 months ago

It seems from the failing tests that this functionality was intended. I guess the desired usage was that you should catch the CancelFittingException in your user code instead within the package.

I don't have a strong opinion how either option. We either adjust the test or close the original issue. @lorenzoh @ToucheSir any thoughts?

ToucheSir commented 7 months ago

I also saw that test and was confused about the behaviour. My problem with asking users to catch this is that CancelFittingException can be thrown for other, unrelated reasons. If we want to keep this as an error, there ought to be a way to identify a CancelFittingException as early stopping without checking the error message. And if we decide to not have an error propagate, then there should be a way for users to detect whether early stopping, NaN loss, etc has been encountered.

cirobr commented 7 months ago

I also saw that test and was confused about the behaviour. My problem with asking users to catch this is that CancelFittingException can be thrown for other, unrelated reasons. If we want to keep this as an error, there ought to be a way to identify a CancelFittingException as early stopping without checking the error message. And if we decide to not have an error propagate, then there should be a way for users to detect whether early stopping, NaN loss, etc has been encountered.

I tend to agree that catching the error should be a function for the package, not for the users. Regardless of final solution, it must not allow the code to terminate abruptly as it happened.

darsnack commented 7 months ago

It sounds like we need to distinguish early stopping (benign) from NaN loss (bad). I would say we create a EarlyStoppingException type. We throw this from the early stopping callback and catch this in fit! function.

Things like stopping on a NaN loss can still use the CancelFittingException so the test should not need adjustment.

I believe now the minor version should be bumped, since this is a breaking change.