DrewSimtech / optimizer

for plotting convergance
0 stars 0 forks source link

Saddle Points #16

Open DrewSimtech opened 6 years ago

DrewSimtech commented 6 years ago

As a developer I would like the tool to be able to handle saddle points. When a gradient for a variable converges to 0 before the other variables' gradients do, then the tool wont be able to handle the 0 gradient and will crash.

DrewSimtech commented 6 years ago

Current solution to avoid crashes is to accept all extrema including saddle points and maxima. This is incorrect as it should be finding minima only. But this is an acceptable hotfix until a proper solution is found.

DrewSimtech commented 6 years ago

One solution would be to handle the gradients of each variable individually instead of taking the magnitude of the collection. At the point at which a variable's gradient is smaller than epsilon it can be flagged and prevented from updating. This would require some testing to see how it would effect the BFGS.

DrewSimtech commented 6 years ago

It was recommended that I try out clamping small gradients to a non-zero value. I'm worried that this will cause the BFGS to steer incorrectly if the data doesn't match what it was passed into it. It's worth looking into and testing though.

DrewSimtech commented 6 years ago

A document on ways to deal with saddle points in an SGD optimizer: https://www.offconvex.org/2016/03/22/saddlepoints/