Zero-shot accuracy on ImageNet (in the CLIP setting) is lower than the number reported in the paper

alipay / Ant-Multi-Modal-Framework

Research Code for Multimodal-Cognition Team in Ant Group

Creative Commons Attribution 4.0 International

60 stars 2 forks source link

Zero-shot accuracy on ImageNet (in the CLIP setting) is lower than the number reported in the paper #6

Open shyammarjit opened 3 months ago

shyammarjit commented 3 months ago

Zero-shot accuracy on ImageNet (in the CLIP setting)

Top-1 accuracy: 77.15 Top-5 accuracy: 95.51

Paper reported accuracy on ImageNet is (Top-1): 88.5

209ye commented 3 months ago

The 88.5 accuracy mentioned in the paper here should be the 10B model, which seems to have not been announced yet. The published 0.4b model looks at the data in the paper and is (Top-1) 78.5.

shyammarjit commented 1 month ago

How do I load the 10B model? Is it open-sourced yet?