BorealisAI / flora-opt

This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.
https://arxiv.org/abs/2402.03293
GNU Lesser General Public License v3.0
27 stars 3 forks source link

Implementing Flora with Adam #1

Closed aashiqmuhamed closed 1 week ago

aashiqmuhamed commented 2 weeks ago

Hello, I was wondering if Flora can also be implemented with Adam. For the plot below image

Was GaLore implemented with Adafactor here (and without the first moment adjustment) or was it using Adam?

yongchanghao commented 2 weeks ago

Hi there. Technically, Flora can be implemented based on the vanilla Adam by projecting both the momentum and second moment. However, in the paper (and also in this figure), we used Flora upon Adafactor because Adafactor is already a well-known optimizer to compress the second moment. The GaLore exp in this figure is the original version as in their paper (i.e. based on vanilla Adam).

github-actions[bot] commented 1 week ago

Stale due to inactivity. Closing in 3 days if no further activities.