encord-team / text-to-image-eval

Evaluate custom and HuggingFace text-to-image/zero-shot-image-classification models like CLIP, SigLIP, DFN5B, and EVA-CLIP. Metrics include Zero-shot accuracy, Linear Probe, Image retrieval, and KNN accuracy.
https://encord.com
Apache License 2.0
35 stars 1 forks source link

feat: add image retrieval model evaluation metric #50

Closed eloy-encord closed 6 months ago

eloy-encord commented 7 months ago

I modified the evaluation setup to allow model evaluation metrics other than classification metrics. As a proof of concept, added the image retrieval evaluation metric that calculates for each class text-prompt how many of the top 100 most relevant images refer to such class.

On the side, I created a class interface that implements the behaviour of the evaluation model title because when EvaluationModel class was used in type hints then an unfilled title warning occurred. This wasn't a problem earlier because the classification model instances were created with input dicts, thus hiding potential errors. New evaluation models will fill the title via the super().__init__ call in their corresponding init method. If an evaluation model fails to do so, an error will be triggered as soon as is tested.