How to deal with very small gradients?

I faced a situation where my model suddenly stopped training (weights were not being updated) after certain epochs. After digging a bit I realized it had to do with this line: https://github.com/IssamLaradji/sls/blob/e2522d5ad765e2fc5826c11518914d820add9d4f/sls/sls.py#L92

In most cases, I think this would make sense, but I am currently having a situation where my gradient norms are small for all batches, yet my validation loss is still very bad. When I rerun with a different seed I don't have this situation, which suggests I may have fallen into a very bad local minimum.

Would it be OK for me to remove the mentioned line? Or is there an important reason for this check to be in place?

IssamLaradji / sls

How to deal with very small gradients? #9