Open LHH20000923 opened 2 years ago
I use CASIA-WebFace, optim is SGD ,lr=0.1, batchsize=256/512, and didn't use knowledge distillation。 The loss curve is shown below。Is it a problem with the data set?
For casia dataset, you may need to train the model for at least 50 epochs with lr step of 20 30 and 40. The reported FR result in the paper is based on MS1Mv2
which database do you use for training, loss function, batch size etc...?