IssamLaradji / sls

Implements stochastic line search
117 stars 23 forks source link

How to deal with very small gradients? #9

Closed chanshing closed 3 years ago

chanshing commented 3 years ago

I faced a situation where my model suddenly stopped training (weights were not being updated) after certain epochs. After digging a bit I realized it had to do with this line: https://github.com/IssamLaradji/sls/blob/e2522d5ad765e2fc5826c11518914d820add9d4f/sls/sls.py#L92

In most cases, I think this would make sense, but I am currently having a situation where my gradient norms are small for all batches, yet my validation loss is still very bad. When I rerun with a different seed I don't have this situation, which suggests I may have fallen into a very bad local minimum.

Would it be OK for me to remove the mentioned line? Or is there an important reason for this check to be in place?

IssamLaradji commented 3 years ago

It should be okay, 1e-8 as a threshold for grad_norm was chosen arbitrarily so smaller numbers would probably work as well.

If the gradients are too small and you would like to bring the step size back up, you could set reset_option=2 in the list of hyperparameters. This option will reset the step size to the initial value in every iteration before doing the line-search, which might help push the model to a better solution faster.

Good question, thanks for sharing!