Getting nan tensor in output

Hello, I wanted to know if it happened also to you during training to have the model outputting full nan tensors. It happens to me some times and I wanted to know if it is a problem of the model or it is a problem of my setup. I'm currently training a tiny version of the model in order to make it enter in RAM so I had to drop some layers of the final stage and in general the number of heads, dims and etc.

EDIT:

I forgot to mention I'm training on mixed precision for memory issues

You have any idea why this can happen?

JingyunLiang / VRT

Getting nan tensor in output #59