baal-org / baal

Bayesian active learning library for research and industrial usecases.
https://baal.readthedocs.io
Apache License 2.0
862 stars 86 forks source link

Support for early stopping in `ModelWrapper.train_on_dataset()` #261

Closed arthur-thuy closed 1 year ago

arthur-thuy commented 1 year ago

Is your feature request related to a problem? Please describe. In the ModelWrapper.train_on_dataset() function, the number of epochs to train for needs to be specified. When tuning the learning convergence, it is challenging to decide on this value as the amount of training data increases throughout the active learning process.

Describe the solution you'd like Use early stopping in the ModelWrapper.train_on_dataset() function, which interrupts training as the validation loss stops decreasing.

Describe alternatives you've considered Similar to the Model.fit() function in Keras, the ModelWrapper.train_on_dataset() function could take callbacks and validation_data arguments.

Additional context Note that early stopping at this level is different from early stopping at the level of the active learning process, which stops labelling new instances when the current labelled set contains all the information necessary.

arthur-thuy commented 1 year ago

I'm sorry, I noticed that the ModelWrapper.train_and_test_on_datasets() function allows for early stopping with the patience and min_epoch_for_es arguments. I didn't see it at first because this function or early stopping functionality was not used in any of the examples.

I do have a follow-up request when using a validation set for early stopping; I will open a new issue.