RE-PARAMETERIZING YOUR OPTIMIZERS RATHER THAN ARCHITECTURES

5g4s commented 1 year ago

We propose to incorporate the prior knowledge via modifying the gradients according to a set of model-specific hyper-parameters before updating the parameters.

5g4s commented 1 year ago

1) Similar to the regular models like ResNet, RepVGG also adds priors into models with well-designed structures and uses a generic optimizer, but RepOpt-VGG adds priors into the optimizer.

2) Compared to a RepOpt-VGG, though the converted RepVGG has the same inference-time structure, the training-time RepVGG is much more complicated and consumes more time and memory to train. In other words, a RepOpt-VGG is a real plain model during training, but a RepVGG is not.

3) We extend and deepen Structural Re-parameterization (Ding et al., 2021), which improves the performance of a model by changing the training dynamics via extra structures. We show that changing the training dynamics with an optimizer has a similar effect but is more efficient.

5g4s commented 1 year ago

Proposition: CSLA block + regular optimizer = single operator + optimizer with GR.

5g4s / paper

RE-PARAMETERIZING YOUR OPTIMIZERS RATHER THAN ARCHITECTURES #34