Closed aashiqmuhamed closed 1 week ago
Hi there. Technically, Flora can be implemented based on the vanilla Adam by projecting both the momentum and second moment. However, in the paper (and also in this figure), we used Flora upon Adafactor because Adafactor is already a well-known optimizer to compress the second moment. The GaLore exp in this figure is the original version as in their paper (i.e. based on vanilla Adam).
Stale due to inactivity. Closing in 3 days if no further activities.
Hello, I was wondering if Flora can also be implemented with Adam. For the plot below![image](https://github.com/BorealisAI/flora-opt/assets/17514579/bbe417b2-c070-4145-b4d1-6435fdbdcd56)
Was GaLore implemented with Adafactor here (and without the first moment adjustment) or was it using Adam?