BatsResearch / csp

Learning to compose soft prompts for compositional zero-shot learning.
BSD 3-Clause "New" or "Revised" License
83 stars 6 forks source link

CLIP results not match #2

Closed xugy16 closed 2 years ago

xugy16 commented 2 years ago

Really appreciate for your code.

But when I run the evalution code for pretrained-clip on closed setting.

Testing: 100%|██████████████████████████████████| 46/46 [00:32<00:00, 1.42it/s] closed_attr_match 0.3926| closed_obj_match 0.511| closed_match 0.3288| closed_seen_match 0.0| closed_unseen_match 0.5066| closed_ca 2.0| closed_seen_ca 1.0| closed_unseen_ca 1.0| closed_ub_attr_match 0.2557| closed_ub_obj_match 0.5268| closed_ub_match 0.1695| closed_ub_seen_match 0.0792| closed_ub_unseen_match 0.2184| closed_ub_ca 2.0| closed_ub_seen_ca 1.0| closed_ub_unseen_ca 1.0| biasterm 0.1718| best_unseen 0.5066| best_seen 0.0938| AUC 0.0346| hm_unseen 0.2464| hm_seen 0.0772| best_hm 0.1176| attr_acc 0.2557| obj_acc 0.5268| done!

Not matcing the paper results. Is there anything I miss?

nihalnayak commented 2 years ago

Thanks for posting this issue. I have updated the readme with instructions to benchmark the pretrained clip (see Closed World Evaluation).

If your issue still persists, it would be awesome if you could provide the following details: (0) When running evlauate.py, did you use --experiment_name clip to benchmark your results? (1) Dataset: mit-states, ut-zappos, or cgqa? (2) Model Name: ViT-B/32 or ViT-L/14? (3) Did you make any changes to the prompt?

Best, Nihal

xugy16 commented 2 years ago

closed_attr_match 0.3771| closed_obj_match 0.4128| closed_match 0.3185| closed_seen_match 0.0| closed_unseen_match 0.4907| closed_ca 2.0| closed_seen_ca 1.0| closed_unseen_ca 1.0| closed_ub_attr_match 0.2169| closed_ub_obj_match 0.6132| closed_ub_match 0.1702| closed_ub_seen_match 0.129| closed_ub_unseen_match 0.1925| closed_ub_ca 2.0| closed_ub_seen_ca 1.0| closed_ub_unseen_ca 1.0| biasterm 0.203| best_unseen 0.4907| best_seen 0.1584| AUC 0.0497| hm_unseen 0.2189| hm_seen 0.1212| best_hm 0.156| attr_acc 0.2169| obj_acc 0.6132|

Really appreciate for you reply. Above is my current result.

From the paper, I guess the number is best_unseen 0.4907| best_seen 0.1584 which is calcuated by adding bias when calcualting AUC.

But I think we should report seen and unseen without bias, right?

nihalnayak commented 2 years ago

Awesome! The results match.

We follow the evaluation from https://github.com/ExplainableML/czsl. They report the best seen and unseen with bias.

Hope this helps!

Best, Nihal

nihalnayak commented 2 years ago

Marking this issue as resolved. Feel free to reopen if you have any more questions.