Closed adefazio closed 11 months ago
@konstmish this is the approach I was thinking for adding support for layer-wise scaling. I think I have to keep the scaling factor separate from the lr parameter, otherwise it will break backwards compatibility.
Supports layer-wise scaling through the parameter group property "layer_scale".