Closed miguelgfierro closed 6 years ago
Actually in paper all experiments were done using sgd with momentum. I did try Adam but it was harder to make it work as well as momentum on this problem. Btw, an interesting ICLR 2018 paper regarding Adam: https://iclr.cc/Conferences/2018/Schedule?showEvent=78
@okuchaiev small change, don't accept it if you want. The thing is that I was doing experiments and realized that using adam the algo didn't converge. However, with momentum it always converged.
I'm seeing consistently a worse performance with adam than with momentum in different DL tasks