Open graycrown opened 5 years ago
Have you solved the problem? Thanks for your reply.
You can solve it with changing a line in Arcface like this: sin_theta = torch.sqrt(sin_theta_2)
to sin_theta = torch.sqrt(sin_theta_2 + 1e-8)
.
It is also useful to set torch.autograd.set_detect_anomaly(True)
in such cases to find the source of the problem
My training code like this, metric is ArcFace,:
feature, output = model(images, targets=labels) arc_output = metric(feature, labels) loss = loss_CE(arc_output, labels) print(metric.kernel) loss.backward() optimizer.step() print(metric.kernel)
Before the loss.backward, metric.kernel is the normal value, but when I perform the optimizer.step(), the value of metric.kernel will became nan and the loss will keep the same value forever. I notice that your torch version seems to be 0.4 My torch version is 1.0 with python3.5
Can you help me about this?