[Bug]: ema ajusts with epoch count not "ema update step interval"

ppbrown commented 2 weeks ago

What happened?

I had previously wondered why changing the "ema update step interval" seemed to make no difference.

Today I found out. Changing the epoch count, changes ema behaviour.

What did you expect would happen?

sample images should not change when I change epoch count, if I have scheduler=linear, etc.

Relevant log output

No response

Output of `pip freeze`

No response

Nerogar commented 2 weeks ago

If you use a learning rate scheduler other than constant, the epochs will be used to calculate the schedule. Different learning behavior should be expected.

ppbrown commented 2 weeks ago

Then this is a documentation bug rather than a behaviour bug. The tooltip should mention this. Otherwise, the term says "steps", so reasonable expectation is that it Actually Means "training steps".

and/or rename the term to something that doesnt have "steps" in the name. If you just remove "step" and call it "Update interval", that's a start.

Nerogar commented 2 weeks ago

The EMA step interval is an optimization option. It's not supposed to have an effect. Higher values increase training speed, but can reduce the quality of the EMA model.

ppbrown commented 2 weeks ago

All tuning is related together, it affects how I have to organize my strategy for tuning evaluation. I am tuning on some subtle features. Previously I was training using linear scheduler. When doing comparison runs for training: I could set my EMA value, chose whatever total number of epochs I liked, and things would progress consistently. epoch 10 of run A, would be reasonably consistent with epoch 10 of run B

I changed to adafactor, and could do the same thing.... IF i dont define EMA.

But if I turn on EMA.. which I need in some cases... and I only actually want a short number of epochs... for consistency, I have to always set total number of epochs to 100. Even if I only really want a 20 or 30 epoch run for some sets Very annoying.

On Sun, Jun 16, 2024 at 6:53 AM Nerogar @.***> wrote:

The EMA step interval is an optimization option. It's not supposed to have an effect. Higher values increase training speed, but can reduce the quality of the EMA model.

— Reply to this email directly, view it on GitHub https://github.com/Nerogar/OneTrainer/issues/338#issuecomment-2171670687, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANEV6N6TRQ7MY7D4B4IBWDZHWKELAVCNFSM6AAAAABJKH6D2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZRGY3TANRYG4 . You are receiving this because you authored the thread.Message ID: @.***>

oO0 commented 2 weeks ago

ppbrown, do EMA working good only with big epoch number?

ppbrown commented 2 weeks ago

Depends on the dataset (and dataset size) and the current learning rate.

Sometimes it works best for what I'm doing with epoch=20, other times, more like 100 The higher LR I'm using, the stronger EMA effect I may need.

Going from memory now, but if I recall correctly:

For high learning rates, I typically want a shorter epoch count. But.. I then usually also need a stronger EMA effect... which needs a LONGER epoch count.

ppbrown commented 2 days ago

Funny thing; i just noticed that the importance of adjusting EMA strength based on learning rate, is mention in this paper:

https://arxiv.org/abs/2312.02696

Nerogar / OneTrainer