CURRENTF / MEFT

18 stars 1 forks source link

How to govern the optimization state? #1

Closed Danield21 closed 4 months ago

Danield21 commented 4 months ago

Hi, thanks for your amazing work! I have some problems with governing the optimization history during MEFT training: (1) Does MEFT be compatible with SGD-M or AdAM, the optimizers equipping with historical optimization state? (2) If so, I suggest that the optimization states should be stored in CPU memory to reduce the expensive memory usage on GPU. And the optimization update may occur on CPU, will the Adam/SGD-M optimization update on CPU cause more time latency?

Thanks again for your impressive method! Looking forward to yourreply!

Daniel

CURRENTF commented 4 months ago

Thank you for appreciating our work. Yes, the optimization states are stored in CPU memory. Here, we have utilized DeepSpeed's CPU Adam optimizer to ensure optimization efficiency on the CPU.