Loss of ArcFaceLoss became nan after the loss.backward

graycrown commented 5 years ago

My training code like this, metric is ArcFace,:

feature, output = model(images, targets=labels) arc_output = metric(feature, labels) loss = loss_CE(arc_output, labels) print(metric.kernel) loss.backward() optimizer.step() print(metric.kernel)

Before the loss.backward, metric.kernel is the normal value, but when I perform the optimizer.step(), the value of metric.kernel will became nan and the loss will keep the same value forever. I notice that your torch version seems to be 0.4 My torch version is 1.0 with python3.5

Can you help me about this?

XiXiRuPan commented 4 years ago

Have you solved the problem? Thanks for your reply.

pollytur commented 3 years ago

You can solve it with changing a line in Arcface like this: sin_theta = torch.sqrt(sin_theta_2) to sin_theta = torch.sqrt(sin_theta_2 + 1e-8) . It is also useful to set torch.autograd.set_detect_anomaly(True) in such cases to find the source of the problem

TreB1eN / InsightFace_Pytorch

Loss of ArcFaceLoss became nan after the loss.backward #93