Open ysharma1126 opened 2 years ago
CC: @QuentinDuval
Hi @ysharma1126,
First of all, thank you for your interest in VISSL :)
Indeed, you are perfectly right, we don't exactly use the same transformation as the CLIP paper, but instead use the same transformations as we usually do for linear evaluations (which uses RandomResizedCrop
), mostly because:
So we chose consistency whenever possible (the only exception we have for RandomResizedCrop
is for datasets in which cropping would actually be harmful, such as CLEVR/Count, for which, clearly, random cropping breaks the task, which consist in counting objects, so we need to see all of them), knowing that there was not a single way to evaluate models.
However, you are free to change those augmentations to reproduce CLIP benchmarks protocols, or create a set of configuration to reproduce VTAB protocol / CLIP protocol / etc.
I hope this help :) Quentin
Thanks! Since we are discussing the configs, a few follow-up questions,
RandomHorizontalFlip
during training. Overall, I think it would be helpful if comments could be added specifying (a) the reference to what, if anything, this config should be reproducing and (b) any intentional discrepancies with the reproduction, like what was discussed above with regards to CLIP. With that being said, entirely understand if this wouldn't be worth the time cost.
Here, I attach the config YAML for the VISSL implementation of a linear probe evaluation from the CLIP benchmark.
RandomResizedCrop
andRandomHorizontalFlip
are specified inDATA.TRAIN.TRANSFORMS
, resulting in a discrepancy betweenDATA.TRAIN.TRANSFORMS
andDATA.TEST.TRANSFORMS
. However, in the linear probe evaluation provided in the CLIP code release, the preprocessing for train and test are the same, and are (on the whole) equivalent to what's specified in the config asDATA.TEST.TRANSFORMS
, notDATA.TRAIN.TRANSFORMS
.If this discrepancy was intentional, could the reasoning be clarified for users?