FluxML / Optimisers.jl

Optimisers.jl defines many standard optimisers and utilities for learning loops.
https://fluxml.ai/Optimisers.jl
MIT License
75 stars 22 forks source link

Add implementation of Lion optimiser #129

Closed mashu closed 1 year ago

mashu commented 1 year ago

Implementation of Lion optimiser which is faster than AdamW from Symbolic Discovery of Optimization Algorithms

mashu commented 1 year ago

@chengchingwen I amended the commit so it should now reflect your suggestions.

mcabbott commented 1 year ago

For easier reference the paper's description (including weight decay) is:

Screenshot 2023-02-19 at 23 26 07

Later, in the appendix, they have pseudo-code without weight decay:

Screenshot 2023-02-19 at 23 25 38

Note that this needs m_{t-1} but never c_{t-1}, which is why they advertise it as needing to store fewer arrays than Adam, i.e. like one not two copies of the parameters:

Screenshot 2023-02-19 at 23 35 13
chengchingwen commented 1 year ago

Looks like it's ready to go?