Made a little graph to compare plain SGD (lr=2e-5) with plain Adam optimizer, on a very simple quaddratic function with 3 parameters. It's like a "guess the 3 numbers I'm thinking of" game.
Adam was a good improvement, but I wonder if there isn't some way to do much better.
Made a little graph to compare plain SGD (lr=2e-5) with plain Adam optimizer, on a very simple quaddratic function with 3 parameters. It's like a "guess the 3 numbers I'm thinking of" game.
Adam was a good improvement, but I wonder if there isn't some way to do much better.