Issues in convergence with ArcFace loss

shreyansh-das commented 1 year ago

I tried to train your ViT implementation and other different backbones (like ConvNeXt, MaxViT, NFNet, CoAtNet, etc.) with the ArcFace loss function, and the loss and accuracies do not seem to converge. The loss either becomes stagnant at a value of about 20 or reduces to NaNs (with the default learning rate of 0.1 with SGD optimiser). The same backbones trained with CosFace loss are able to converge properly.

The ResNet backbones however, perform well when trained with ArcFace.

Any insights on why these losses perform so differently even though they are intuitively very similar, and how we can get the backbones to converge with ArcFace?

Any help would be highly appreciated @anxiangsir

stavyadatta commented 1 year ago

following this

filonenkoa commented 1 year ago

Same problem.

IanNobody commented 4 months ago

What I found from my experiments with ViTs and ArcFace loss, it is very sensitive to chosen hyperparameters.

deepinsight / insightface

Issues in convergence with ArcFace loss #2144