Open zihangJiang opened 5 years ago
fix weight decay in param_optimizer to agree with the original (hugging face's) implementation.
(Current implementation seems to apply weight decay of 0.01 to all parameters, since "n not in no_decay" is always True.)
fix weight decay in param_optimizer to agree with the original (hugging face's) implementation.
(Current implementation seems to apply weight decay of 0.01 to all parameters, since "n not in no_decay" is always True.)