deepinsight / insightface

State-of-the-art 2D and 3D Face Analysis Project
https://insightface.ai
23.44k stars 5.42k forks source link

Resume training ArcFace-Torchh but it starts all from beginning and epoch is set to 0 #1798

Open gradient1706 opened 3 years ago

gradient1706 commented 3 years ago

Hi, Thanks for your nice work!

I am using Colab to train ArcFace-Torch on Asian-Celeb dataset. After Colab disconects, I want to continue training from where it left by setting config.resume = True and successfully loading the latest backbone.pth file. However, the training process starts all over from epoch 0, acc and loss seems to be same as acc and loss of the first time I train from scratch, and even worse. Did I successfully resume training? What should I do to fix it? Thank in advance.

nttstar commented 3 years ago

You may need to write some code, such as modifying epoch number and saving optimizer params.