Hi, thanks for your amazing work!
I have some problems with governing the optimization history during MEFT training:
(1) Does MEFT be compatible with SGD-M or AdAM, the optimizers equipping with historical optimization state?
(2) If so, I suggest that the optimization states should be stored in CPU memory to reduce the expensive memory usage on GPU. And the optimization update may occur on CPU, will the Adam/SGD-M optimization update on CPU cause more time latency?
Thanks again for your impressive method! Looking forward to yourreply!
Thank you for appreciating our work. Yes, the optimization states are stored in CPU memory. Here, we have utilized DeepSpeed's CPU Adam optimizer to ensure optimization efficiency on the CPU.
Hi, thanks for your amazing work! I have some problems with governing the optimization history during MEFT training: (1) Does MEFT be compatible with SGD-M or AdAM, the optimizers equipping with historical optimization state? (2) If so, I suggest that the optimization states should be stored in CPU memory to reduce the expensive memory usage on GPU. And the optimization update may occur on CPU, will the Adam/SGD-M optimization update on CPU cause more time latency?
Thanks again for your impressive method! Looking forward to yourreply!
Daniel