How to reproduce the CLIP score in the paper?

Nikolai10 / PerCo

PyTorch implementation of PerCo (Towards Image Compression with Perfect Realism at Ultra-Low Bitrates, ICLR 2024)

Apache License 2.0

34 stars 1 forks source link

How to reproduce the CLIP score in the paper? #8

Closed CXNing closed 6 days ago

CXNing commented 6 days ago

Very good work! I have some questions about the calculation method of clip score when reproducing your work. I refer to " https://github.com/jmhessel/clipscore/tree/main" according to the reference in your paper, where the range of clip score is not greater than 1, but the result shown in your paper is very large. I would like to ask how you calculated it?

Nikolai10 commented 6 days ago

Hello @CXNing,

I am not sure if I understand your question correctly. Could you please clarify what you mean by "in your paper, where the range of clip score is not greater than 1, but the result shown in your paper is very large.".

According to the paper https://arxiv.org/abs/2310.10325, the CLIP-score is computed using (A Experimental details):

For CLIP score, we compute image and text embeddings with the CLIP backbone ViT-B/32

My guess would be that the authors used https://lightning.ai/docs/torchmetrics/stable/multimodal/clip_score.html with model_name_or_path='openai/clip-vit-base-patch32'.

Hope this helps! Nikolai

CXNing commented 6 days ago

Thank you very much for getting back to me so quickly. My question is why the value of CLIP score is greater than 1. Your reply has answered my question. Thanks again!

Nikolai10 commented 6 days ago

Glad to hear that! For future references:

The score is bound between 0 and 100 and the closer to 100 the better. (source: https://lightning.ai/docs/torchmetrics/stable/multimodal/clip_score.html)

This matches the value range given in PerCo/ CLIP-score (e.g. Fig. 3).