Open CharlieCheckpt opened 2 years ago
@CharlieCheckpt Thank you for bringing this to our attention!
After doing some digging, it seems like there may be some differences in the wide architectures. The wide architecture introduced: https://arxiv.org/abs/1605.07146 indeed only doubles the width of the residual layers and not the conv1.
But I have seen official checkpoints that also doubles the width of the conv1 layer. See for example BYOL: https://github.com/chigur/byol-convert/blob/main/resnet.py#L169. This script properly loads and converts the RESNET-200 2x BYOL model.
I am not sure if this is a confusion in the literature or a concious decision -- if so, I have not seen it explicit in what I've read.
@prigoyal @QuentinDuval Do you guys know anything more about this?
agree with above. It might be best to extend the resnext code in vissl to support different versions i.e. "wide_resnet50_2" as in https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py#L20
Hi vissl team ! Thank you for the great package.
I got a dimension error when running the example from the documentation to train MoCo with ResNet-50-w2 (2x wider ResNet-50).
This error seems to be due to a bug in the architecture of ResNet-50-w2. Indeed I compared it with the architecture of torchvision.models.wide_resnet50_2 and the architectures are different.
Looking at 3. below, one can see that in vissl, first layer of ResNet-50-w2. is :
(conv1): Conv2d(3, 128, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
In torchvision:
prints :
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) ...
Instructions To Reproduce the 🐛 Bug:
what changes you made (
git diff
) or what code you wrote Nonewhat exact command you run: I ran the command proposed in the documentation.
Expected behavior:
If there are no obvious error in "what you observed" provided above, please tell us the expected behavior.
Environment:
Provide your environment information using the following command: