embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.97k stars 275 forks source link

Custom Models do not receive prompt_type or task_name #1491

Open ArthurSBianchessi opened 4 hours ago

ArthurSBianchessi commented 4 hours ago

Specifically SentenceTransformerWrapper (from mteb/models/sentence_transformer_wrapper.py) does not forward these variables to Custom Model encode.

Samoed commented 3 hours ago

Can you provide code to check?

ArthurSBianchessi commented 1 hour ago

Sure. The code in the README.md raises error due to missing task_name

from mteb.encoder_interface import PromptType
import mteb
from mteb import MTEB
import numpy as np

class CustomModel:
    def encode(
        self,
        sentences: list[str],
        task_name: str,
        prompt_type: PromptType | None = None,
        **kwargs,
    ) -> np.ndarray:
        """Encodes the given sentences using the encoder.

        Args:
            sentences: The sentences to encode.
            task_name: The name of the task.
            prompt_type: The prompt type to use.
            **kwargs: Additional arguments to pass to the encoder.

        Returns:
            The encoded sentences.
        """
        pass

model = CustomModel()
tasks = [mteb.get_task("Banking77Classification") ]            # README doesnt turn the tasks into a list, else it raises an error if get_task is used 
evaluation = MTEB(tasks=tasks)        
evaluation.run(model)

It raises:

TypeError: CustomModel.encode() missing 1 required positional argument: 'task_name'

Printing the inputs of encode with:

class CustomModel:
    def encode(self, sentences, *inputs, **kwargs):
        print(f'inputs: {inputs}')
        print(f'kwargs: {kwargs}')

outputs:

inputs: ()
kwargs: {'prompt_name': None, 'batch_size': 32}

There might be some other variable that is not being forwarded.