HomebrewNLP / Olmax

HomebrewNLP in JAX flavour for maintable TPU-Training
BSD 2-Clause "Simplified" License
45 stars 6 forks source link

Gradient Noise #46

Open ClashLuke opened 2 years ago

ClashLuke commented 2 years ago

Some have suggested that adding gradient noise helps deep models converge and generalise. Other works, such as DDPG, showed that this is the case even for shallow networks of a different domain. That's why it could be interesting for us to explore gradient noise as an option to improve generalisation and with that convergence by avoiding overfitting and other local minima during training.\ One option to further improve gradient noise would be to combine it with #35, by adding different noise to each optimiser. This change would allow us to create combinations like Adam#Adam, where each optimiser sees slightly different noise at each step.\ This issue tracks the progress of such a scheme.