facebookresearch / dadaptation

D-Adaptation for SGD, Adam and AdaGrad
MIT License
501 stars 19 forks source link

Support layer-wise scaling #40

Closed adefazio closed 11 months ago

adefazio commented 11 months ago

Supports layer-wise scaling through the parameter group property "layer_scale".

adefazio commented 11 months ago

@konstmish this is the approach I was thinking for adding support for layer-wise scaling. I think I have to keep the scaling factor separate from the lr parameter, otherwise it will break backwards compatibility.