OFA-Sys / Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
MIT License
4.32k stars 448 forks source link

Zero-shot分类问题 #214

Open cdy-for-grad opened 11 months ago

cdy-for-grad commented 11 months ago

作者您好, CN-clip是很棒的工作!我在复现voc-2007-classification zero-shot推理的过程中发现最终推理的性能与report的结果无法对齐,下面是我的执行结果,烦请有空是帮看下问题,感谢。

Params: context_length: 52 datapath: **/Chinese-CLIP/content/datasets/voc-2007-classification/test dataset: voc-2007-classification img_batch_size: 64 index: label_file: **/Chinese-CLIP/content/datasets/voc-2007-classification/label_cn.txt num_workers: 4 precision: amp resume: **/Chinese-CLIP/content/pretrained_weights/clip_cn_vit-h-14.pt save_dir: **/Chinese-CLIP/eval_result//voc-2007-classification text_model: RoBERTa-wwm-ext-large-chinese vision_model: ViT-H-14 Loading vision model config from cn_clip/clip/model_configs/ViT-H-14.json Loading text model config from cn_clip/clip/model_configs/RoBERTa-wwm-ext-large-chinese.json Preparing zeroshot dataset. 224 Begin to load model checkpoint from **/Chinese-CLIP/content/pretrained_weights/clip_cn_vit-h-14.pt. => loaded checkpoint **/Chinese-CLIP/content/pretrained_weights/clip_cn_vit-h-14.pt (epoch 7 @ 40 000 steps) Building zero-shot classifier Using classifier 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:15<00:00, 1.28it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 78/78 [04:04<00:00, 3.13s/it] torch.Size([4952, 20]) Result: zeroshot-top1: 0.09268982229402262 Finished.

测试的数据为voc-2007-classification 从https://github.com/OFA-Sys/Chinese-CLIP/blob/master/zeroshot_dataset.md 处下载

而论文中的性能为 image

DesertsP commented 9 months ago

同样的问题,VOC数据集指标差异过大了,使用相同的设置