Possible causes of NaN values

AlbertoRemus commented 2 years ago

Hi! I would like to ask a question about this part of the code where it is managed the presence of NaN values

https://github.com/aimagelab/mcmr/blob/main/main.py#L817-L824

In particular, what can cause the presence of NaN values in the mask?

Once detected and managed a NaN mask, how this can impact a proper training?

stefanopini commented 2 years ago

Hi Alberto, as far as I remember NaN values occur with some renderers when the camera lies inside the object that it is trying to render. In our case, that happens if the object increases so much in size (through the scale and the location of the vertices) that some vertices exceed the distance of the camera, falling "behind" it.

In those cases, without the check you linked, the loss becomes NaN too causing all the network weights to collapse to NaN values. It happens rarely, but it could happen, in particular in the first training steps. Those lines of code prevent the training to break.

AlbertoRemus commented 2 years ago

@stefanopini many thanks for the explanation, this makes totally sense

aimagelab / mcmr

Possible causes of NaN values #8