Last night I have a power down at home and I lost 5 bash session of training, several of those were training for 3 days already with about 250000 images.
I have all checkpoint, model.ckpt and model.ckpt.meta from minutes before the power down.
I saw one of the issues were some one requested a checkpoint file, so I presume it is possible to resume a training session with those files, but I dont know how.
I launch a training in the same folder with the same files, but the training start at epoch 0. And I would need the training to resume in the last epoch or checkpoint.
Hi,
Last night I have a power down at home and I lost 5 bash session of training, several of those were training for 3 days already with about 250000 images.
I have all checkpoint, model.ckpt and model.ckpt.meta from minutes before the power down.
I saw one of the issues were some one requested a checkpoint file, so I presume it is possible to resume a training session with those files, but I dont know how.
I launch a training in the same folder with the same files, but the training start at epoch 0. And I would need the training to resume in the last epoch or checkpoint.
Can anyone help me on that?.
Thanks
Alex