Thank you very much for this beautiful work. Since that Adam has a generalization issue in the case of L2 regularization, it would be great if you can also provide an implementation of HD for AdamW. There is an implementation here: https://github.com/mpyrozhok/adamwr
I am also recently experimenting with Padam and QHAdam but couldn't obtain any improvement via them on the RL problem that I am working on. Do you have any thoughts about them?
Thank you very much for this beautiful work. Since that Adam has a generalization issue in the case of L2 regularization, it would be great if you can also provide an implementation of HD for AdamW. There is an implementation here: https://github.com/mpyrozhok/adamwr
I am also recently experimenting with Padam and QHAdam but couldn't obtain any improvement via them on the RL problem that I am working on. Do you have any thoughts about them?