Closed dennymarcels closed 1 year ago
Thanks for the suggestion Denny. I've considered adding this before. It'll involve passing a path to an output directory to TrainingArguments/Seq2SeqTrainingArguments instead of using a temp directory.
Now supported with version 3.0.0.
Pass a float between 0 and 1 to your TrainArgs's _savesteps parameter to specify the ratio of the number of steps that will occur before saving is done. For example, if set to 0.1 saving will occur ten times over the entire training process. By default _savesteps is set to 0 and no saving is performed. The checkpoints will be saved to the directory provided by TrainArgs's _outputdir parameter which by default is "happy_transformer/".
from happytransformer import GENTrainArgs
train_args = GENTrainArgs(save_steps=0.1)
I was wondering if I could easily save and persistent periodic checkpoints while training, and also resume training (from the checkpoints) in case my environment crashes.