Open ppbrown opened 2 weeks ago
If you use a learning rate scheduler other than constant, the epochs will be used to calculate the schedule. Different learning behavior should be expected.
Then this is a documentation bug rather than a behaviour bug. The tooltip should mention this. Otherwise, the term says "steps", so reasonable expectation is that it Actually Means "training steps".
and/or rename the term to something that doesnt have "steps" in the name. If you just remove "step" and call it "Update interval", that's a start.
The EMA step interval is an optimization option. It's not supposed to have an effect. Higher values increase training speed, but can reduce the quality of the EMA model.
All tuning is related together, it affects how I have to organize my strategy for tuning evaluation. I am tuning on some subtle features. Previously I was training using linear scheduler. When doing comparison runs for training: I could set my EMA value, chose whatever total number of epochs I liked, and things would progress consistently. epoch 10 of run A, would be reasonably consistent with epoch 10 of run B
I changed to adafactor, and could do the same thing.... IF i dont define EMA.
But if I turn on EMA.. which I need in some cases... and I only actually want a short number of epochs... for consistency, I have to always set total number of epochs to 100. Even if I only really want a 20 or 30 epoch run for some sets Very annoying.
On Sun, Jun 16, 2024 at 6:53 AM Nerogar @.***> wrote:
The EMA step interval is an optimization option. It's not supposed to have an effect. Higher values increase training speed, but can reduce the quality of the EMA model.
— Reply to this email directly, view it on GitHub https://github.com/Nerogar/OneTrainer/issues/338#issuecomment-2171670687, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANEV6N6TRQ7MY7D4B4IBWDZHWKELAVCNFSM6AAAAABJKH6D2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZRGY3TANRYG4 . You are receiving this because you authored the thread.Message ID: @.***>
ppbrown, do EMA working good only with big epoch number?
Depends on the dataset (and dataset size) and the current learning rate.
Sometimes it works best for what I'm doing with epoch=20, other times, more like 100 The higher LR I'm using, the stronger EMA effect I may need.
Going from memory now, but if I recall correctly:
For high learning rates, I typically want a shorter epoch count. But.. I then usually also need a stronger EMA effect... which needs a LONGER epoch count.
Funny thing; i just noticed that the importance of adjusting EMA strength based on learning rate, is mention in this paper:
What happened?
I had previously wondered why changing the "ema update step interval" seemed to make no difference.
Today I found out. Changing the epoch count, changes ema behaviour.
What did you expect would happen?
sample images should not change when I change epoch count, if I have scheduler=linear, etc.
Relevant log output
No response
Output of
pip freeze
No response