CharlesShang / FastMaskRCNN

Mask RCNN in TensorFlow
Apache License 2.0
3.1k stars 1.1k forks source link

can the train restore the place before the computer shutdown #115

Open CodeIsWorld opened 7 years ago

CodeIsWorld commented 7 years ago

when I was training the example, the computer crashed so I reboot the computer. I continue the 'python train/train.py' I would like to know if this starts from the beginning or the place we stopped? I know it will check the file checkpoint but is it from the 1st pic again? cause the iter is from 1

Tetsujinfr commented 7 years ago

Anyone on this? When trying to quit and restart the training it looks like it restart from zero, although some model files from previous training has been saved. So is it possible to breakdown the training into several runs? Also, is there a way to properly quit the training? (For now I do ctrl+C)

Thanks for your help Tets

QtSignalProcessing commented 7 years ago

@CodeIsWorld @Tetsujinfr I tried to restore the model, but I am not sure if my understanding is correct. In train.py, there's a function called restore, and I think this function can restore the trained model.

The reason of starting from zero when you restart the program is the following code in train.py: for step in range(FLAGS.max_iters) In my code, I get the restored iteration number by adding the following code in the func restore: stem = os.path.splitext(os.path.basename(checkpoint_path))[1] global global_iter global_iter = int(stem.split('-')[1])

and modified the for loop : for step in range(global_iter,FLAGS.max_iters):