ResNet produces error with dilation on, does not reach claimed accuracy when dilation off.

Lyken17 commented 5 years ago

When I load using model = gcv.models.resnet50(pretrained=True) and test forwarding, error RuntimeError: size mismatch, m1: [1 x 991232], m2: [2048 x 1000] raises. I think there should be something wrong with stride / downsampling.

# gluoncv-torch resolution
torch.Size([1, 64, 56, 56])
torch.Size([1, 2048, 28, 28])
torch.Size([1, 2048, 22, 22])
torch.Size([1, 991232])

# Pytorch vision resolution
torch.Size([1, 64, 56, 56])
torch.Size([1, 2048, 7, 7])
torch.Size([1, 2048, 1, 1])
torch.Size([1, 2048])

After a quick look into the code, I thought the cause might be the dilation. So I turned off the dilation using model = gcv.models.resnet50(pretrained=True, dilated=False). This time model forwards without error, however, does not reach comparable performance as GluonCV claims.

➜  tmp git:(master) ✗ CUDA_VISIBLE_DEVICES=0 python main.py /ssd/dataset/imagenet/ --arch resnet50 --pretrained -e
=> using pre-trained model 'resnet50'
Test: [0/196]   Time 8.818 (8.818)  Loss 0.5173 (0.5173)    Acc@1 86.328 (86.328)   Acc@5 97.656 (97.656)
Test: [10/196]  Time 0.549 (1.302)  Loss 0.9599 (0.6264)    Acc@1 76.172 (83.416)   Acc@5 92.188 (96.058)
Test: [20/196]  Time 0.550 (0.944)  Loss 0.7655 (0.6439)    Acc@1 84.766 (83.147)   Acc@5 92.188 (95.778)
Test: [30/196]  Time 0.551 (0.867)  Loss 0.7689 (0.6200)    Acc@1 82.422 (84.085)   Acc@5 95.312 (95.892)
Test: [40/196]  Time 0.558 (0.823)  Loss 0.6093 (0.6584)    Acc@1 86.719 (82.793)   Acc@5 96.875 (95.846)
Test: [50/196]  Time 0.551 (0.776)  Loss 0.4676 (0.6536)    Acc@1 88.281 (82.598)   Acc@5 96.875 (96.025)
Test: [60/196]  Time 0.565 (0.748)  Loss 0.9166 (0.6706)    Acc@1 74.609 (82.185)   Acc@5 94.141 (96.126)
Test: [70/196]  Time 0.556 (0.724)  Loss 0.6710 (0.6548)    Acc@1 78.516 (82.543)   Acc@5 97.266 (96.259)
Test: [80/196]  Time 0.558 (0.709)  Loss 1.2860 (0.6754)    Acc@1 67.578 (82.205)   Acc@5 90.625 (95.964)
Test: [90/196]  Time 0.762 (0.698)  Loss 1.8590 (0.7210)    Acc@1 57.422 (81.186)   Acc@5 86.719 (95.497)
Test: [100/196] Time 0.563 (0.685)  Loss 1.0176 (0.7670)    Acc@1 73.438 (80.171)   Acc@5 92.969 (95.042)
Test: [110/196] Time 0.566 (0.685)  Loss 0.7850 (0.7894)    Acc@1 81.250 (79.761)   Acc@5 94.531 (94.781)
Test: [120/196] Time 0.626 (0.686)  Loss 1.1852 (0.8062)    Acc@1 72.656 (79.513)   Acc@5 90.234 (94.544)
Test: [130/196] Time 0.691 (0.679)  Loss 0.6243 (0.8364)    Acc@1 83.203 (78.739)   Acc@5 96.875 (94.212)
Test: [140/196] Time 0.563 (0.676)  Loss 0.9735 (0.8516)    Acc@1 75.781 (78.405)   Acc@5 92.969 (94.066)
Test: [150/196] Time 0.657 (0.671)  Loss 1.0451 (0.8695)    Acc@1 79.688 (78.042)   Acc@5 89.844 (93.822)
Test: [160/196] Time 0.566 (0.665)  Loss 0.6886 (0.8843)    Acc@1 84.766 (77.717)   Acc@5 94.141 (93.592)
Test: [170/196] Time 0.567 (0.663)  Loss 0.5897 (0.9007)    Acc@1 83.203 (77.273)   Acc@5 96.875 (93.416)
Test: [180/196] Time 0.567 (0.658)  Loss 1.1500 (0.9158)    Acc@1 68.359 (76.895)   Acc@5 94.531 (93.269)
Test: [190/196] Time 0.567 (0.655)  Loss 1.1336 (0.9163)    Acc@1 68.359 (76.820)   Acc@5 94.922 (93.294)
 * Acc@1 76.954 Acc@5 93.348

zhanghang1989 commented 5 years ago

The model is trained and tested using Gluon MXNet. The main difference is the data pipeline. They use opencv for load and preprocess images.

Lyken17 commented 5 years ago

I think they did same processing for validation. In pytorch, the transform is

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],  std=[0.229, 0.224, 0.225])

datasets.ImageFolder(valdir, transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    normalize,
])

For gluoncv, the transform is

normalize = transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
crop_ratio = opt.crop_ratio if opt.crop_ratio > 0 else 0.875
resize = int(math.ceil(input_size / crop_ratio))
transform_test = transforms.Compose([
    transforms.Resize(resize, keep_ratio=True),
    transforms.CenterCrop(input_size),
    transforms.ToTensor(),
    normalize
])

zhanghang1989 commented 5 years ago

when loading the same image and resizing it using bilinear mode, PIL and opencv won't give the same output.

StacyYang / gluoncv-torch

ResNet produces error with dilation on, does not reach claimed accuracy when dilation off. #11