cliang1453 / SAGE

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (ICLR 2022)
MIT License
29 stars 2 forks source link