Closed drscotthawley closed 6 months ago
Yes I'm aware of this. The MNIST example is just to show how to use the optimizer. MNIST is a terrible test for optimizers, it's too easy a problem, basically everything works on it. We are looking at adding some more self-contained examples, we just haven't had the bandwidth yet.
Thanks for your reply! I understand. Looking forward to learning more.
Dear authors, thank you very much for releasing your code. I'm looking forward to using it to achieve better training results on a variety of problems.
Except that.. for the MNIST example, I find that that if I replace ScheduleFree with either ordinary AdamW (with no scheduling) or AdamW + Cosine Annealing, then... it score about the same as ScheduleFree on the MNIST example, for a variety of learning rates. ...And I don't just mean the final scores/states, I mean throughout the whole training sequence.
It's possible I did something wrong, however, all I did was run your example, as is, and then also make a version where I removed
schedulefree
and just went with an ordinary AdamW with and then without cosine schedule... and losses and accuracy scores are generally comparable, or even better sometimes compared to the results withschedulefree
(e.g. 99.3% accuracy for AdamW with no schedule vs 99.2% for ScheduleFree). I'll include a diff of my "comparable AdamW" code below just to clarity what I did....Any comments on that? (This is not a "challenge", this is "I really do want to understand and to take advantage of this".)
I tried learning rates of 0.05, 0.001, and your default rate, and... I wasn't trying to fine-tune the learning rate or number of steps or anything. Perhaps MNIST Is just too "easy" of a problem to really showcase the improvements offered by ScheduleFree? Perhaps the later examples you plan to include will be more demonstrative.
Also, a Feature Request: in your subsequent examples: Might you perhaps make it so that it's easy to switch out Schedule Free with other methods, to better demonstrate its effectiveness? Maybe via a CLI argument that defaults to choosing your method, but that can otherwise choose AdamW for example?
Here's a diff of my non-
schedulefree
example code. It's just your main.py with a few lines changed: