cybertronai / pytorch-lamb

Implementation of https://arxiv.org/abs/1904.00962
MIT License
369 stars 49 forks source link

Set trust_ratio to 1 if weight_norm or adam_norm is 0. #6

Closed ousou closed 5 years ago

ousou commented 5 years ago

This fixes issues with learning weights for layers that start out as zeroes (for instance bias in normalization layers). In the current version if a layer starts out as all zeroes the weights will never change.

This was done earlier in the implementation in this repo also, but is not present in the current version. According to the TensorFlow implementation (see https://github.com/ymcui/LAMB_Optimizer_TF/blob/a804c2f2995cda9a4f6b804ab445e19fc4a1036f/optimization.py#L264) setting the trust_ratio to 1 is how the paper authors do also in case the numerator or denominator is zero.

8enmann commented 5 years ago

Thanks!