Open sdbds opened 2 months ago
Cool! I'll look into this.
It's great that you're looking into this, @adefazio . Schedule-free Adam was strong, and now AdEMAMix is giving me great results too. If it turns out it's possible to combine their advantages, that would be amazing.
Cool! I'll look into this.
when will the AdEMAmixScheduleFree come true? can it be achieved by just connecting AdEMAmix with ScheduleFreeWrapper?
code:https://github.com/nanowell/AdEMAMix-Optimizer-Pytorch
8bit version from bnb:https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/optim/ademamix.py
Tests have shown that AdEMAMix is better than AdamW and has little to no increase in memory.