How to govern the optimization state?

Hi, thanks for your amazing work! I have some problems with governing the optimization history during MEFT training: (1) Does MEFT be compatible with SGD-M or AdAM, the optimizers equipping with historical optimization state? (2) If so, I suggest that the optimization states should be stored in CPU memory to reduce the expensive memory usage on GPU. And the optimization update may occur on CPU, will the Adam/SGD-M optimization update on CPU cause more time latency?

Thanks again for your impressive method! Looking forward to yourreply!

Daniel

CURRENTF / MEFT

How to govern the optimization state? #1