About nan and inf grads

Parskatt / RoMa

[CVPR 2024] RoMa: Robust Dense Feature Matching; RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.

https://parskatt.github.io/RoMa/

MIT License

630 stars 51 forks source link

About nan and inf grads #43

Closed lnexenl closed 5 months ago

lnexenl commented 5 months ago

I found that when preforming backward, there sometimes exists warnings like:

do these nan or inf grads have bad effects on training?

Parskatt commented 5 months ago

Hi, we are using fp16 training with a gradscaler. The gradscaler should tale care of nan/infs.

So to answer your question, the nans should not affect the training. But let me know if you have issues.

lnexenl commented 5 months ago

Thank you so much for your reply. I guess it won't be a problem if nan/inf grads only exist once for every dozens of steps?

Parskatt commented 5 months ago

Exactly, but make sure youre using the gradscaler. Actually I had a lot of issues getting fp16 training to be stable, so let me know if you get any other issues.

lnexenl commented 5 months ago

I meet some backward issues when training:

Have you ever met such problem? I add epipolar error by estimating E matrix when training.

Parskatt commented 5 months ago

Is this the backward of the E solver?

I haven't tried myself, but perhaps if you save the inputs to the solver you may be able to find the issue.

Maybe you picked the same correspondence twice?

lnexenl commented 5 months ago

I find out the problem is caused by backward of torch.linalg.solve in E solver. I added an regularization item to A and solved the problem. Thank u a lot for your kindly reply.