The most significant changes are that shampoo now uses 1 - beta like the other optimisers and that I shrank the core algorithm into 15 LOC. Additionally, I removed ctx.parameter_dims, which was used to save the dimension names for each allocated buffer as it's not needed anymore.
The most significant changes are that shampoo now uses
1 - beta
like the other optimisers and that I shrank the core algorithm into 15 LOC. Additionally, I removedctx.parameter_dims
, which was used to save the dimension names for each allocated buffer as it's not needed anymore.