Closed CynthiaChuang closed 2 months ago
Recently, I needed to resume training, so I added the following code to train.py:
train.py
I added import pathlib at line 16 and replaced trainer.train() at line 192 with:
import pathlib
trainer.train()
if list(pathlib.Path(training_args.output_dir).glob("checkpoint-*")): trainer.train(resume_from_checkpoint=True) else: trainer.train()
When executing training, set the model_id to the checkpoint folder where you want to continue training and the output_dir to its parent folder.
model_id
output_dir
I hope this helps those who need it.
@CynthiaChuang Thanks for the issue. I'll add resuming from checkpoint soon. I really appriciate this.
I've added the auto resume code. Thank you.
Recently, I needed to resume training, so I added the following code to
train.py
:I added
import pathlib
at line 16 and replacedtrainer.train()
at line 192 with:When executing training, set the
model_id
to the checkpoint folder where you want to continue training and theoutput_dir
to its parent folder.I hope this helps those who need it.