Measure generalization capabilities of CLIP-RSICD model

We want to measure the model's ability to generalize beyond the 30 classes it was trained with. Idea is to take an aerial image of a subject that is not covered by the 30 classes it trained on, and measure its performance against our CLIP-RSICD model, and compare against baseline CLIP model. Evaluation metric used can be similar to the one we used for our original evaluation, i.e. rank of synthetic caption containing the correct class, averaged across all test images.

FMoW may be a good source of aerial images that deal with classes outside the RSICD training set.

arampacha / CLIP-rsicd

Measure generalization capabilities of CLIP-RSICD model #31