JUGGHM / OREPA_CVPR2022

CVPR 2022 "Online Convolutional Re-parameterization"
Apache License 2.0
164 stars 16 forks source link

Numerical Stability #11

Open jonmorton opened 1 year ago

jonmorton commented 1 year ago

Hi, I'm wondering if you've run into any issues with numerical stability or know what may be the cause.

With normal RepVGG, I get differences as high as 4e-4 comparing before and after switching to deploy. After changing first conv to OREPA_LargeConv, I get errors as high as 2e-3. After changing the 1x1 conv in the RepVGG block to OREPA_1x1, I get differences as high as 0.1.

It seems numerical stability makes it challenging to use identity + OREPA_1x1 + OREPA_3x3 blocks in RepVGG style model. Any thoughts about why?

JUGGHM commented 1 year ago

Hi, I'm wondering if you've run into any issues with numerical stability or know what may be the cause.

With normal RepVGG, I get differences as high as 4e-4 comparing before and after switching to deploy. After changing first conv to OREPA_LargeConv, I get errors as high as 2e-3. After changing the 1x1 conv in the RepVGG block to OREPA_1x1, I get differences as high as 0.1.

It seems numerical stability makes it challenging to use identity + OREPA_1x1 + OREPA_3x3 blocks in RepVGG style model. Any thoughts about why?

Thanks for your interest Jon! This is quite weird but interesting. After two hours' investigation, we found that the reason is that the computation on cpu and gpu will lead to different results. We shall notice that the weight re-param procedure is conducted on cpu in convert.py. Therefore if you could add a .cuda() to the end of the 17th line in convert.py, this numerical gap will be shrunk.

But even though, we are still not able to shrink the gap to 0. I am not quite sure whether it is caused by some random factors caused by different convolutional implementation inside torch.