2U1 / Phi3-Vision-Finetune

An open-source implementaion for fine-tuning Phi3-Vision and Phi3.5-Vision by Microsoft.
Apache License 2.0
66 stars 9 forks source link

Resume_from_checkpoint #16

Closed CynthiaChuang closed 2 months ago

CynthiaChuang commented 2 months ago

Recently, I needed to resume training, so I added the following code to train.py:

I added import pathlib at line 16 and replaced trainer.train() at line 192 with:

if list(pathlib.Path(training_args.output_dir).glob("checkpoint-*")):
    trainer.train(resume_from_checkpoint=True)
else:
    trainer.train()

When executing training, set the model_id to the checkpoint folder where you want to continue training and the output_dir to its parent folder.

I hope this helps those who need it.

2U1 commented 2 months ago

@CynthiaChuang Thanks for the issue. I'll add resuming from checkpoint soon. I really appriciate this.

2U1 commented 2 months ago

I've added the auto resume code. Thank you.