请问，imagenet上的数据集精度是如何得到的？

OFA-Sys / Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

MIT License

4.01k stars 418 forks source link

Open xiguadong opened 3 months ago

xiguadong commented 3 months ago

您好，观察到 imagenet的top1 acc 精度较低，请问下是如何测试的呢？在 https://github.com/openai/CLIP/blob/main/notebooks/Prompt_Engineering_for_ImageNet.ipynb 这里提到，text_encoder编码时有加了80组的promt template，最后能够达到 76.2% top1 acc。想请问下cn-clip在测试时有采用相同的 trick嘛？