This repo is the PyTorch implementation of our paper to appear in CVPR2022 on "Online Convolutional Re-parameterization", authored by Mu Hu, Junyi Feng, Jiashen Hua, Baisheng Lai, Jianqiang Huang, Xiaojin Gong and Xiansheng Hua from Zhejiang University and Alibaba Cloud.
OREPA is a two-step pipeline.
Create a new issue for any code-related questions. Feel free to direct me as well at muhu@zju.edu.cn for any paper-related questions.
Models released in this work is trained and tested on:
pip install torch torchvision
pip install numpy matplotlib Pillow
pip install scikit-image
Download our pre-trained models with OREPA:
Note that we don't need to decompress the pre-trained models. Just load the files of .pth.tar format directly.
A complete list of training options is available with
python train.py -h
python test.py -h
python convert.py -h
Train ResNets (ResNeXt and WideResNet included)
CUDA_VISIBLE_DEVICES="0,1,2,3" python train.py -a ResNet-18 -t OREPA --data [imagenet-path]
# -a for architecture (ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-18-2x, ResNeXt-50)
# -t for re-param method (base, DBB, OREPA)
Train RepVGGs
CUDA_VISIBLE_DEVICES="0,1,2,3" python train.py -a RepVGG-A0 -t OREPA_VGG --data [imagenet-path]
# -a for architecture (RepVGG-A0, RepVGG-A1, RepVGG-A2)
# -t for re-param method (base, RepVGG, OREPA_VGG)
Use your self-trained model or our pretrained model
CUDA_VISIBLE_DEVICES="0" python test.py train [trained-model-path] -a ResNet-18 -t OREPA
Convert the training-time models into inference-time models
CUDA_VISIBLE_DEVICES="0" python convert.py [trained-model-path] [deploy-model-path-to-save] -a ResNet-18 -t OREPA
Evaluate with the converted model
CUDA_VISIBLE_DEVICES="0" python test.py deploy [deploy-model-path] -a ResNet-18 -t OREPA
We use mmdetection and mmsegmentation tools on COCO and Cityscapes respectively. If you decide to use our pretrained model for downstream tasks, it is strongly suggested that the learning rate of the first stem layer should be fine adjusted, since the deep linear stem layer has a very different weight distribution from the vanilla one after ImageNet training. Contact @Sixkplus (Junyi Feng) for more details on configurations and checkpoints of the reported ResNet-50-backbone models.
For re-param models, special weight regulization strategies are required for furthur quantization. Meanwhile, dynamic gradient tweaking or differential searching methods might greatly boost the performance. Currently we have not deployed such techniques to OREPA yet. However such methods could be probably applied to our industrial usage in the future. For experience exchanging and sharing on such topics please contact @Sixkplus (Junyi Feng).
If you use our code or method in your work, please cite the following:
@inproceedings{hu22OREPA,
title={Online Convolutional Re-parameterization},
author={Mu Hu and Junyi Feng and Jiashen Hua and Baisheng Lai and Jianqiang Huang and Xiansheng Hua and Xiaojin Gong},
booktitle={CVPR},
year={2022}
}
Codes of this work is developed upon Xiaohan Ding's re-param repositories "Diverse Branch Block: Building a Convolution as an Inception-like Unit" and "RepVGG: Making VGG-style ConvNets Great Again" with similar protocols. Xiaohan Ding is a Ph.D. from Tsinghua University and an expert in structural re-parameterization.