feat: add image retrieval model evaluation metric

I modified the evaluation setup to allow model evaluation metrics other than classification metrics. As a proof of concept, added the image retrieval evaluation metric that calculates for each class text-prompt how many of the top 100 most relevant images refer to such class.

On the side, I created a class interface that implements the behaviour of the evaluation model title because when EvaluationModel class was used in type hints then an unfilled title warning occurred. This wasn't a problem earlier because the classification model instances were created with input dicts, thus hiding potential errors. New evaluation models will fill the title via the super().__init__ call in their corresponding init method. If an evaluation model fails to do so, an error will be triggered as soon as is tested.

encord-team / text-to-image-eval

feat: add image retrieval model evaluation metric #50