Size mismatch on ResNet**d models

FrancescoSaverioZuppichini / glasses

High-quality Neural Networks for Computer Vision 😎

https://francescosaveriozuppichini.github.io/glasses-webapp/

MIT License

431 stars 37 forks source link

Size mismatch on ResNet**d models #236

Closed marcozullich closed 3 years ago

marcozullich commented 3 years ago

There is a size mismatch error when calling the forward method of ResNet**d models.

Example:

net = AutoModel.from_name("resnet50d")
net(torch.rand((1,3,240,240))) # this image size is the one resulting
                               # from the suggested transformations applied to ImageNet

Resulting exception RuntimeError: The size of tensor a (8) must match the size of tensor b (7) at non-singleton dimension 3

I didn't go into details, but it seems to happen during the summation of skip connection + conv output.

This error seems to occur only with some image sizes; for instance, when the image has size 224 x 224 no exception is thrown.

FrancescoSaverioZuppichini commented 3 years ago

The following code works in the develop branch:

import numpy as np
from glasses.models.AutoTransform import AutoTransform
from glasses.models.AutoModel import AutoModel
from PIL import Image

name = 'resnet50d'

tr = AutoTransform.from_name(name)
model = AutoModel.from_name(name)

img = Image.fromarray(np.zeros((300, 300, 3), dtype=np.uint8))
x = tr(img)

with torch.no_grad():
    model(x.unsqueeze(0))

x has the correct size of 224x224. If you pass weird sizes tensor then it may happen that before the last layer the spatial dimensions are odd, this will break the model since the shortcut is using kernel_size=1 while the main block kernel_size=3. Please resize the tensor to the closest 32 multiple, this will also increase performance to my understanding. In your case from 224x224 to 256x256.

FrancescoSaverioZuppichini commented 3 years ago

After a little bit of investigation the problem is in the ResNetShorcutD that uses AvgPool2d instead of a stride 2 Conv to downsample. In the last block, the input size is 1, 1024, 15, 15, the shortcut divide it by two and project to the correct dimension, thus the shape is 1, 2048, 7, 7. The inner weights, using padding, will instead create a tensor of shape 1, 2048, 8, 8 and this is why it fails.

The fix is super easy, just add ceil_mode=True to the nn.AvgPool2d.

Thank you!

Best regards,

Francesco