for some reason, keras optimizers have wildly different learning rates.
some of them default to the values in the original papers, which might work well on small datasets where you can iterate for forever, but probably aren't optimal on a large dataset where you might only get in 6 iterations total.
look into making learning rate consistent across optimizers. probably set it at the SGD rate of 0.01. ideally in the far future this will be configurable based on dataset size, but that's well past MVP.
for some reason, keras optimizers have wildly different learning rates.
some of them default to the values in the original papers, which might work well on small datasets where you can iterate for forever, but probably aren't optimal on a large dataset where you might only get in 6 iterations total.
look into making learning rate consistent across optimizers. probably set it at the SGD rate of
0.01
. ideally in the far future this will be configurable based on dataset size, but that's well past MVP.