SullyChen / Autopilot-TensorFlow

A TensorFlow implementation of this Nvidia paper: https://arxiv.org/pdf/1604.07316.pdf with some changes
MIT License
1.25k stars 425 forks source link

How to resume a training. HELP after electrical power down, the PC went down too !! #30

Closed alexdominguez09 closed 5 years ago

alexdominguez09 commented 5 years ago

Hi,

Last night I have a power down at home and I lost 5 bash session of training, several of those were training for 3 days already with about 250000 images.

I have all checkpoint, model.ckpt and model.ckpt.meta from minutes before the power down.

I saw one of the issues were some one requested a checkpoint file, so I presume it is possible to resume a training session with those files, but I dont know how.

I launch a training in the same folder with the same files, but the training start at epoch 0. And I would need the training to resume in the last epoch or checkpoint.

Can anyone help me on that?.

Thanks

Alex

alexdominguez09 commented 5 years ago

I will close the issu as no answer has been provided. I just had to start all over again. :-(