How to resume training on colab upon session timeout?

NVIDIA / vid2vid

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.

Other

8.57k stars 1.2k forks source link

How to resume training on colab upon session timeout? #146

Open kartikJ-9 opened 4 years ago

kartikJ-9 commented 4 years ago

Decent results require 3k-5k frames. My GPU session on colab gets disconnected due to usage while training. I am saving the checkpoints in the drive. Is there any way I can resume the training from a particular epoch? I have a sequence of images obtained from a video. I am new to PyTorch. Somebody suggested saving the weights of the epoch and continuing from that checkpoint.

renish-charaniya commented 3 years ago

i also have same issue.Plz help!!

BeahIF commented 3 years ago

I need some help about this too, because I use the flag --continue_train and --which__epoch but no matter what number I pass, the training begins from epoch 1.