Closed xugy16 closed 2 years ago
Thanks for posting this issue. I have updated the readme with instructions to benchmark the pretrained clip (see Closed World Evaluation).
If your issue still persists, it would be awesome if you could provide the following details:
(0) When running evlauate.py
, did you use --experiment_name clip
to benchmark your results?
(1) Dataset: mit-states, ut-zappos, or cgqa?
(2) Model Name: ViT-B/32 or ViT-L/14?
(3) Did you make any changes to the prompt?
Best, Nihal
closed_attr_match 0.3771| closed_obj_match 0.4128| closed_match 0.3185| closed_seen_match 0.0| closed_unseen_match 0.4907| closed_ca 2.0| closed_seen_ca 1.0| closed_unseen_ca 1.0| closed_ub_attr_match 0.2169| closed_ub_obj_match 0.6132| closed_ub_match 0.1702| closed_ub_seen_match 0.129| closed_ub_unseen_match 0.1925| closed_ub_ca 2.0| closed_ub_seen_ca 1.0| closed_ub_unseen_ca 1.0| biasterm 0.203| best_unseen 0.4907| best_seen 0.1584| AUC 0.0497| hm_unseen 0.2189| hm_seen 0.1212| best_hm 0.156| attr_acc 0.2169| obj_acc 0.6132|
Really appreciate for you reply. Above is my current result.
From the paper, I guess the number is best_unseen 0.4907| best_seen 0.1584 which is calcuated by adding bias when calcualting AUC.
But I think we should report seen and unseen without bias, right?
Awesome! The results match.
We follow the evaluation from https://github.com/ExplainableML/czsl. They report the best seen and unseen with bias.
Hope this helps!
Best, Nihal
Marking this issue as resolved. Feel free to reopen if you have any more questions.
Really appreciate for your code.
But when I run the evalution code for pretrained-clip on closed setting.
Testing: 100%|██████████████████████████████████| 46/46 [00:32<00:00, 1.42it/s] closed_attr_match 0.3926| closed_obj_match 0.511| closed_match 0.3288| closed_seen_match 0.0| closed_unseen_match 0.5066| closed_ca 2.0| closed_seen_ca 1.0| closed_unseen_ca 1.0| closed_ub_attr_match 0.2557| closed_ub_obj_match 0.5268| closed_ub_match 0.1695| closed_ub_seen_match 0.0792| closed_ub_unseen_match 0.2184| closed_ub_ca 2.0| closed_ub_seen_ca 1.0| closed_ub_unseen_ca 1.0| biasterm 0.1718| best_unseen 0.5066| best_seen 0.0938| AUC 0.0346| hm_unseen 0.2464| hm_seen 0.0772| best_hm 0.1176| attr_acc 0.2557| obj_acc 0.5268| done!
Not matcing the paper results. Is there anything I miss?