Closed chanshing closed 3 years ago
It should be okay, 1e-8
as a threshold for grad_norm was chosen arbitrarily so smaller numbers would probably work as well.
If the gradients are too small and you would like to bring the step size back up, you could set reset_option=2
in the list of hyperparameters. This option will reset the step size to the initial value in every iteration before doing the line-search, which might help push the model to a better solution faster.
Good question, thanks for sharing!
I faced a situation where my model suddenly stopped training (weights were not being updated) after certain epochs. After digging a bit I realized it had to do with this line: https://github.com/IssamLaradji/sls/blob/e2522d5ad765e2fc5826c11518914d820add9d4f/sls/sls.py#L92
In most cases, I think this would make sense, but I am currently having a situation where my gradient norms are small for all batches, yet my validation loss is still very bad. When I rerun with a different seed I don't have this situation, which suggests I may have fallen into a very bad local minimum.
Would it be OK for me to remove the mentioned line? Or is there an important reason for this check to be in place?