cliang1453 / SAGE

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (ICLR 2022)
MIT License
29 stars 2 forks source link

Reproduction of machine translation results #1

Open zwhe99 opened 2 years ago

zwhe99 commented 2 years ago

Hi~ Is it possible to release SAGE's code for machine translation tasks?

cliang1453 commented 2 years ago

Hi @zwhe99 , I have no upcoming plan of releasing it - I no longer have access to the server that stores the code, unfortunately.

However, it should be easy to implement yourself. Our implementation was based on fairseq. You might add an AdamW-SAGE class in https://github.com/facebookresearch/fairseq/tree/main/fairseq/optim following UnstructAwareAdamW in this repo.

zwhe99 commented 2 years ago

Hi @cliang1453, I found that you define different param groups with different 'params_type' and 'weight_decay' here. Did you do the same in the fairseq version?