Question about SONG's implementation

CongWeilin commented 8 months ago

Hi, thank you for this awesome package.

I have some question regarding implementation of SONG. After carefully reading the implementation, I couldn't find where the 3rd equation of https://docs.libauc.org/api/libauc.optimizers.html#module-libauc.optimizers.song (i.e., gradient variance reduction) is implemented by checking the code at https://docs.libauc.org/_modules/libauc/optimizers/song.html#SONG.step. The PyTorch implementation of SONG optimizer seems like a combination of SGD and Adam, and I couldn't find that step 3.

Please let me know if I have missed anything or overlook some details. Many thanks.

zhqiu commented 8 months ago

Hello, thank you for your attention. The third equation can be found implemented in the SONG optimizer, available at this link. It offers a 'mode' option, allowing you to choose between 'sgd' and 'adam' for updating model parameters.

CongWeilin commented 8 months ago

Thank you for your timely reply. Could you please point to me which line is implementing $m_{t+1} = \beta_1 m_t + (1-\beta_1) G(\mathbf{w}_t)$, thanks.

zhqiu commented 8 months ago

Sure. The momentum update for sgd is implemented by: buf.mul(momentum).add(d_p, alpha=1 - dampening). The momentum update for adam is: expavg.mul(beta1).add_(grad, alpha=1 - beta1).

Optimization-AI / LibAUC

Question about SONG's implementation #41