OFA-Sys / Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
MIT License
4.01k stars 418 forks source link

请问,imagenet上的数据集精度是如何得到的? #276

Open xiguadong opened 3 months ago

xiguadong commented 3 months ago

您好,观察到 imagenet的top1 acc 精度较低,请问下是如何测试的呢? 在 https://github.com/openai/CLIP/blob/main/notebooks/Prompt_Engineering_for_ImageNet.ipynb 这里提到,text_encoder编码时有加了80组的promt template,最后能够达到 76.2% top1 acc。想请问下cn-clip在测试时有采用相同的 trick嘛?