ChEB-AI / python-chebai

GNU Affero General Public License v3.0
11 stars 4 forks source link

Add Early Stopping feature #17

Closed VenkateshDas closed 5 months ago

VenkateshDas commented 5 months ago

Implement Early Stopping with PyTorch Lightning

Description

This pull request introduces the Early Stopping feature from PyTorch Lightning to provide flexibility in stopping training runs before completing the total number of epochs. This helps prevent overfitting and can improve model performance.

Changes:

How to Use

Callback Replacement:

Modify your trainer configuration file (configs/training/default_trainer.yml) to replace the existing callbacks value with configs/training/early_stop_callbacks.yml.

Command-Line Arguments:

Update the Early Stopping arguments directly from the CLI:

python3 -m chebai fit --trainer=configs/training/default_trainer.yml --trainer.callbacks=configs/training/early_stop_callbacks.yml --trainer.callbacks.init_args.monitor=val_loss_epoch --trainer.callbacks.init_args.patience=5

_Note : Please make sure that 'min_epochs' is not provided in the defaulttrainer config to work with the early stopping feature.

Reference for Early Stopping in Pytorch lightning: https://lightning.ai/docs/pytorch/stable/common/early_stopping.html

sfluegel05 commented 5 months ago

Thanks for the detailed description. Two comments:

VenkateshDas commented 5 months ago

@sfluegel05

  1. Apologies for the inconsistency in the file name. I have modified the file name for early_stop_callbacks.yml to be same as in the implementation and comment.
  2. Yes, I tried this implementation for the training run and made sure the training was stopped when there was no improvement in the "val_loss_epoch" value. I forgot to mention that min_epochs in the default_trainer have to be commented out to work with the early_stopping feature. I added that in the description and also commented it in the default trainer config.