ChenDelong1999 / RemoteCLIP

🛰️ Official repository of paper "RemoteCLIP: A Vision Language Foundation Model for Remote Sensing" (IEEE TGRS)
https://arxiv.org/abs/2306.11029
Apache License 2.0
228 stars 13 forks source link

For Zero-shot inference #18

Closed yuanpanlifly closed 6 months ago

yuanpanlifly commented 6 months ago

For Zero-shot inference in the paper, the results in your paper reported 68.62% and 77.96% for raw CLIP and Remote CLIP results on the AID dataset under the ViT-B-32 backbone, respectively. Using the same template-based prompting as you (a satellite photo of {class name}) my result is only 0.195% when using raw CLIP for inference. It's a big difference from your results, so I would like to ask if the CLIP in your paper is the CLIP after continuous training or the original model posted on the OpenAI website?

ChenDelong1999 commented 6 months ago

We used the standard CLIP weights from OpenAI for evaluation.

yuanpanlifly commented 6 months ago

We used the standard CLIP weights from OpenAI for evaluation.

Thanks for the reply, may I ask if the accuracy is acc@1 or acc@5 for image classification zero-shot inference in the paper?

ChenDelong1999 commented 6 months ago

We reported the standard top-1 accuracy.

yuanpanlifly commented 6 months ago

We reported the standard top-1 accuracy. Can you share your code for zero-shot inferencing?

yuanpanlifly commented 6 months ago

Can you disclose the template for your prompts on image classification, please?