Closed CXNing closed 6 days ago
Hello @CXNing,
I am not sure if I understand your question correctly. Could you please clarify what you mean by "in your paper, where the range of clip score is not greater than 1, but the result shown in your paper is very large.".
According to the paper https://arxiv.org/abs/2310.10325, the CLIP-score is computed using (A Experimental details):
For CLIP score, we compute image and text embeddings with the CLIP backbone ViT-B/32
My guess would be that the authors used https://lightning.ai/docs/torchmetrics/stable/multimodal/clip_score.html with model_name_or_path='openai/clip-vit-base-patch32'
.
Hope this helps! Nikolai
Thank you very much for getting back to me so quickly. My question is why the value of CLIP score is greater than 1. Your reply has answered my question. Thanks again!
Glad to hear that! For future references:
The score is bound between 0 and 100 and the closer to 100 the better. (source: https://lightning.ai/docs/torchmetrics/stable/multimodal/clip_score.html)
This matches the value range given in PerCo/ CLIP-score (e.g. Fig. 3).
Very good work! I have some questions about the calculation method of clip score when reproducing your work. I refer to " https://github.com/jmhessel/clipscore/tree/main" according to the reference in your paper, where the range of clip score is not greater than 1, but the result shown in your paper is very large. I would like to ask how you calculated it?