Closed cooperlab closed 1 year ago
Note - the steps/iterations in ema_overwrite_frequency
refers to batches or application of gradients. So this parameter determines whether averaging should be applied after so many batches (int
), or only once per epoch (None
).
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
This is a complete draft that adds moving averaging for all previously supported optimizers. The example notebook runs.
Newer tf.keras.optimizers incorporate exponential moving averaging of weights.
Using this requires bumping the TF version to 2.11. Also, many existing optimizers are moved temporarily to tf.keras.optimizers.experimental.