Open sunrise6513 opened 10 months ago
It is shown that the robust accuracy of the fine-tuned EVA ViT on ObjectNet is obviously lower than that of the EVA CLIP model. Is that because the former is tested with imagenet 1k classes, while the latter being tested with ObjectNet classes?
It is shown that the robust accuracy of the fine-tuned EVA ViT on ObjectNet is obviously lower than that of the EVA CLIP model. Is that because the former is tested with imagenet 1k classes, while the latter being tested with ObjectNet classes?