Closed ClashLuke closed 1 year ago
Complex heavy-ball momentum (as proposed in the complex momentum paper) has interesting properties. Below you can see the real value of the momentum curve at momentum=0.8 and complex_momentum=0.3j when it's fed with 100 1s. It appears to oscillate strongly by itself, which the paper claims to balance out the oscillations caused by the model. Fortunately, supervised models trained in an end-to-end fashion (like ours) rarely show cyclic behavior, so we likely don't need it. It might still be worth a try, but I wouldn't prioritize it.
Closing as the expected gain is too low.
Others have had great success with complex momentum, showing improved performance across optimizers and improving momentum stability. Additionally, others have shown momentum to be quite unstable in LLMs. Therefore complex momentum could be an attractive approach to improve stability without decreasing performance.\ This issue is about implementing complex momentum and testing it against the baseline.