Add implementation of Lion optimiser

FluxML / Optimisers.jl

Optimisers.jl defines many standard optimisers and utilities for learning loops.

https://fluxml.ai/Optimisers.jl

MIT License

75 stars 22 forks source link

Add implementation of Lion optimiser #129

Closed mashu closed 1 year ago

mashu commented 1 year ago

Implementation of Lion optimiser which is faster than AdamW from Symbolic Discovery of Optimization Algorithms

mashu commented 1 year ago

@chengchingwen I amended the commit so it should now reflect your suggestions.

mcabbott commented 1 year ago

For easier reference the paper's description (including weight decay) is:

Later, in the appendix, they have pseudo-code without weight decay:

Note that this needs m_{t-1} but never c_{t-1}, which is why they advertise it as needing to store fewer arrays than Adam, i.e. like one not two copies of the parameters:

chengchingwen commented 1 year ago

Looks like it's ready to go?