Saving checkpoints - Githubissues

dennymarcels commented 1 year ago

I was wondering if I could easily save and persistent periodic checkpoints while training, and also resume training (from the checkpoints) in case my environment crashes.

EricFillion commented 1 year ago

Thanks for the suggestion Denny. I've considered adding this before. It'll involve passing a path to an output directory to TrainingArguments/Seq2SeqTrainingArguments instead of using a temp directory.

EricFillion commented 1 year ago

Now supported with version 3.0.0.

Pass a float between 0 and 1 to your TrainArgs's _savesteps parameter to specify the ratio of the number of steps that will occur before saving is done. For example, if set to 0.1 saving will occur ten times over the entire training process. By default _savesteps is set to 0 and no saving is performed. The checkpoints will be saved to the directory provided by TrainArgs's _outputdir parameter which by default is "happy_transformer/".

from happytransformer import GENTrainArgs

train_args = GENTrainArgs(save_steps=0.1)

EricFillion / happy-transformer

Saving checkpoints #306