Closed jgehring closed 6 years ago
Otherwise, there's no way to do annealing (other than creating a new optimizer?)
Hello @jgehring! Just for my education, I'm curious - is annealing ever performed for Adam?
It's not very common but has been done, for example in the "Attention is all you need" paper on NMT.
Otherwise, there's no way to do annealing (other than creating a new optimizer?)