Closed WangFeng18 closed 4 years ago
Hi, @WangFeng18,
Good catch! In short, 0.08 is more aggressive data augmentation, and if I recall correctly, it brings about marginal improvement ( roughly < 0.3% ) on full ImageNet. Indeed, RandomGrayscale
is the key augmentation, and dropping it leads to a significant performance drop.
My code uses 0.2 because my initial baseline on ImageNet100 used this threshold. So I just want to be consistent. For full ImageNet, I want to closely match the standard data augmentation for supervised learning, and therefore I used 0.08. Does this make sense?
Thanks for you reply!I thought you have resolved my confusion.
Hi, thank you for sharing the code! I am curious about the effect of the data augmentation, concretely the RandomResizedCrop in train_moco_ins.py. In your codes, the minimum crop scale is 0.2 for most choices but 0.08 for imagenet full dataset with ResNet, however the parameter in other papers such as non parametric instance discrimination is also set to 0.2 when using ResNet as backbone. So I am curious about the choice(0.08 as default torchvision parameter). Is this smaller scale work better in full imagenet? Have you validated the performance on imagenet between 0.08 and 0.2 with a ResNet backbone?