HobbitLong / SupContrast

PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)
BSD 2-Clause "Simplified" License
3.08k stars 528 forks source link

Is this using resnet50x4 width? #132

Closed hdeng26 closed 11 months ago

hdeng26 commented 1 year ago

The paper said used cifar10 and cifar100 results from PyTorch implementation. I tried this code and the result is similar to the paper. However, I tried this structure with imagenet2012 and I found this structure is larger than the original resnet50 (150gb GPU memory got OOM on bs=2048). I think the structure of this resnet is not 1x width resnet. As the paper mentioned the cifar10 and cifar100 top 1 accuracy is based on the original resnet50 structure, I just want to confirm if the 96% on CIFAR10 and 76.5% on CIFAR100 is based on 4x width resnet50 or 1x width resnet50.

StackChan commented 1 year ago

What you mentioned has been proposed in other issues like: issue63 issue74 issue83 Just as the author admitted:" I see the problem. The first conv layer of resnet_big.py is adapted for CIFAR-10/100, by changing the kernel size to 3 and stride to 1. If resnet_big.py is directly applied to ImageNet images with 224 inputs, it will create a very large feature maps that occupy very heavy memory." And someone mentioned that"I replace the resnet50 in your resnet_big.py with the official one. Then the problem is fixed"

hdeng26 commented 1 year ago

What you mentioned has been proposed in other issues like: issue63 issue74 issue83 Just as the author admitted:" I see the problem. The first conv layer of resnet_big.py is adapted for CIFAR-10/100, by changing the kernel size to 3 and stride to 1. If resnet_big.py is directly applied to ImageNet images with 224 inputs, it will create a very large feature maps that occupy very heavy memory." And someone mentioned that"I replace the resnet50 in your resnet_big.py with the official one. Then the problem is fixed"

Thanks for mentioning that. Yes, the official one works for ImageNet. But I am still confused about transfer learning experiments from Imagenet pretrained resnet50 by direct linear eval on cifar10 and cifar100. I'm not sure if the second term pretrain step should be done on the target dataset before linear eval but I think SimCLR v1 mentioned they directly use linear evaluation on the target dataset. I tried to upsample 3232 to 224224 or change the first convolutional layer kernel size but the accuracy was around 50% on cifar10 and lower on cifar100.