Closed ousou closed 5 years ago
This fixes issues with learning weights for layers that start out as zeroes (for instance bias in normalization layers). In the current version if a layer starts out as all zeroes the weights will never change.
This was done earlier in the implementation in this repo also, but is not present in the current version. According to the TensorFlow implementation (see https://github.com/ymcui/LAMB_Optimizer_TF/blob/a804c2f2995cda9a4f6b804ab445e19fc4a1036f/optimization.py#L264) setting the trust_ratio to 1 is how the paper authors do also in case the numerator or denominator is zero.
Thanks!
This fixes issues with learning weights for layers that start out as zeroes (for instance bias in normalization layers). In the current version if a layer starts out as all zeroes the weights will never change.
This was done earlier in the implementation in this repo also, but is not present in the current version. According to the TensorFlow implementation (see https://github.com/ymcui/LAMB_Optimizer_TF/blob/a804c2f2995cda9a4f6b804ab445e19fc4a1036f/optimization.py#L264) setting the trust_ratio to 1 is how the paper authors do also in case the numerator or denominator is zero.