Closed peteer01 closed 4 months ago
You use this options to save the state of the training And then when you want to resume you have to input the folder where the state was saved here
That makes sense! Thank you for sharing that. It might help others if that can be added to the readme.
That depends on him, but I guess he will once the trainer gets an update. Also np, happy to help :)
Also note that saving the state saves all the state, including checkpoint used, optimizers, current learning rate, etc. So each state would take ~6.5GB per epoch on XL training, so it's best to save only the last state your able to train.
Solved
Apologies if this is not the right places to ask for this request, but I am unable to figure out how to resume training from a saved epoch checkpoint. Is it possible to update the readme documentation to explain how to use LoRA Trainer to continue training a previous training run?