Open dswah opened 8 years ago
sometimes, the algorithm will propose a step direction, and then spend a long time backtracking until the learning rate is machine epsilon. perhaps instead of taking a really small step in a bad direction, we should just stop?
look up the stopping criterion that uses norm of gradient and norm of loss!