Open 5g4s opened 1 year ago
We propose to incorporate the prior knowledge via modifying the gradients according to a set of model-specific hyper-parameters before updating the parameters.
1) Similar to the regular models like ResNet, RepVGG also adds priors into models with well-designed structures and uses a generic optimizer, but RepOpt-VGG adds priors into the optimizer.
2) Compared to a RepOpt-VGG, though the converted RepVGG has the same inference-time structure, the training-time RepVGG is much more complicated and consumes more time and memory to train. In other words, a RepOpt-VGG is a real plain model during training, but a RepVGG is not.
3) We extend and deepen Structural Re-parameterization (Ding et al., 2021), which improves the performance of a model by changing the training dynamics via extra structures. We show that changing the training dynamics with an optimizer has a similar effect but is more efficient.
Proposition: CSLA block + regular optimizer = single operator + optimizer with GR.
https://arxiv.org/abs/2205.15242