AdamW - Githubissues

Thank you very much for this beautiful work. Since that Adam has a generalization issue in the case of L2 regularization, it would be great if you can also provide an implementation of HD for AdamW. There is an implementation here: https://github.com/mpyrozhok/adamwr

I am also recently experimenting with Padam and QHAdam but couldn't obtain any improvement via them on the RL problem that I am working on. Do you have any thoughts about them?

                if group['weight_decay'] != 0:
                    decayed_weights = torch.mul(p.data, group['weight_decay'])
                    p.data.addcdiv_(-step_size, exp_avg, denom)
                    p.data.sub_(decayed_weights)
                else:
                    p.data.addcdiv_(-step_size, exp_avg, denom)

gbaydin / hypergradient-descent

AdamW #5