Closed CongWeilin closed 8 months ago
Hello, thank you for your attention. The third equation can be found implemented in the SONG optimizer, available at this link. It offers a 'mode' option, allowing you to choose between 'sgd' and 'adam' for updating model parameters.
Thank you for your timely reply. Could you please point to me which line is implementing $m_{t+1} = \beta_1 m_t + (1-\beta_1) G(\mathbf{w}_t)$, thanks.
Sure. The momentum update for sgd is implemented by: buf.mul(momentum).add(d_p, alpha=1 - dampening). The momentum update for adam is: expavg.mul(beta1).add_(grad, alpha=1 - beta1).
Hi, thank you for this awesome package.
I have some question regarding implementation of SONG. After carefully reading the implementation, I couldn't find where the 3rd equation of https://docs.libauc.org/api/libauc.optimizers.html#module-libauc.optimizers.song (i.e., gradient variance reduction) is implemented by checking the code at https://docs.libauc.org/_modules/libauc/optimizers/song.html#SONG.step. The PyTorch implementation of SONG optimizer seems like a combination of SGD and Adam, and I couldn't find that step 3.
Please let me know if I have missed anything or overlook some details. Many thanks.