deepinsight / insightface

State-of-the-art 2D and 3D Face Analysis Project
https://insightface.ai
22.18k stars 5.28k forks source link

arcface_torch #1486

Open xianwenleon opened 3 years ago

xianwenleon commented 3 years ago

Why does loss NaN appear when the loss drops by 3% in the ARCFACE_TORCH training?Can you give me some advice

markncx commented 3 years ago

I also met the NaN problem. Is the learning rate too large?

Light-- commented 3 years ago

also nan encountered when i add two gradients and then backward...

markncx commented 3 years ago

also nan encountered when i add two gradients and then backward...

reset the learning rate as 0.03, and then, i avoid the NAN issue.

Light-- commented 3 years ago

reset the learning rate as 0.03, and then, i avoid the NAN issue.

what's your original lr? 0.3?

markncx commented 3 years ago

reset the learning rate as 0.03, and then, i avoid the NAN issue.

what's your original lr? 0.3?

reset the learning rate as 0.03, and then, i avoid the NAN issue.

what's your original lr? 0.3?

0.1. large lr and large batch size may lead to nan during training. maybe, reducing the max_norm of gradient clipping is a alternative way. i have not tried it yet.

jareturing commented 3 years ago

I also met loss NAN, and i checked out that the forward features is nan.

Light-- commented 3 years ago

I also met loss NAN, and i checked out that the forward features is nan.

@jareturing yes, i met same problem with you, have you figured it out?