Closed wangjinwei94 closed 3 years ago
It seems we skipped all parameters other than rbr_identity.weight and linear.weight ( https://github.com/DingXiaoH/RepVGG/blob/main/train.py#L86 ) Is it by-design or not?
Yes. It is by design. This is because we would not use L2 twice (weight_decay in optimizer + L2 loss in the total loss).
Thanks! Sorry I missed the get_custom_L2
It seems we skipped all parameters other than rbr_identity.weight and linear.weight ( https://github.com/DingXiaoH/RepVGG/blob/main/train.py#L86 ) Is it by-design or not?