Evaluate custom and HuggingFace text-to-image/zero-shot-image-classification models like CLIP, SigLIP, DFN5B, and EVA-CLIP. Metrics include Zero-shot accuracy, Linear Probe, Image retrieval, and KNN accuracy.
I modified the evaluation setup to allow model evaluation metrics other than classification metrics.
As a proof of concept, added the image retrieval evaluation metric that calculates for each class text-prompt how many of the top 100 most relevant images refer to such class.
On the side, I created a class interface that implements the behaviour of the evaluation model title because when EvaluationModel class was used in type hints then an unfilled title warning occurred. This wasn't a problem earlier because the classification model instances were created with input dicts, thus hiding potential errors. New evaluation models will fill the title via the super().__init__ call in their corresponding init method. If an evaluation model fails to do so, an error will be triggered as soon as is tested.
I modified the evaluation setup to allow model evaluation metrics other than classification metrics. As a proof of concept, added the image retrieval evaluation metric that calculates for each class text-prompt how many of the top 100 most relevant images refer to such class.
On the side, I created a class interface that implements the behaviour of the evaluation model title because when
EvaluationModel
class was used in type hints then an unfilled title warning occurred. This wasn't a problem earlier because the classification model instances were created with input dicts, thus hiding potential errors. New evaluation models will fill the title via thesuper().__init__
call in their corresponding init method. If an evaluation model fails to do so, an error will be triggered as soon as is tested.