HomebrewNLP / Olmax

HomebrewNLP in JAX flavour for maintable TPU-Training
BSD 2-Clause "Simplified" License
45 stars 6 forks source link

Complex Momentum #55

Closed ClashLuke closed 1 year ago

ClashLuke commented 2 years ago

Others have had great success with complex momentum, showing improved performance across optimizers and improving momentum stability. Additionally, others have shown momentum to be quite unstable in LLMs. Therefore complex momentum could be an attractive approach to improve stability without decreasing performance.\ This issue is about implementing complex momentum and testing it against the baseline.

ClashLuke commented 2 years ago

Complex heavy-ball momentum (as proposed in the complex momentum paper) has interesting properties. Below you can see the real value of the momentum curve at momentum=0.8 and complex_momentum=0.3j when it's fed with 100 1s. 100 It appears to oscillate strongly by itself, which the paper claims to balance out the oscillations caused by the model. Fortunately, supervised models trained in an end-to-end fashion (like ours) rarely show cyclic behavior, so we likely don't need it. It might still be worth a try, but I wouldn't prioritize it.

ClashLuke commented 1 year ago

Closing as the expected gain is too low.