Decent results require 3k-5k frames. My GPU session on colab gets disconnected due to usage while training. I am saving the checkpoints in the drive. Is there any way I can resume the training from a particular epoch? I have a sequence of images obtained from a video. I am new to PyTorch. Somebody suggested saving the weights of the epoch and continuing from that checkpoint.
I need some help about this too, because I use the flag --continue_train and --which__epoch but no matter what number I pass, the training begins from epoch 1.
Decent results require 3k-5k frames. My GPU session on colab gets disconnected due to usage while training. I am saving the checkpoints in the drive. Is there any way I can resume the training from a particular epoch? I have a sequence of images obtained from a video. I am new to PyTorch. Somebody suggested saving the weights of the epoch and continuing from that checkpoint.