embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.78k stars 236 forks source link

ERROR:mteb.evaluation.MTEB:Error while evaluating SummEvalFr: 'batch_size' is an invalid keyword argument for encode() #236

Closed LeMoussel closed 6 months ago

LeMoussel commented 6 months ago

I want to evaluate some french embeddings models using MTEB Semantic Text Similarity (STS) task. To do this, I took inspiration from this code run_mteb_french.py

    import logging

    from sentence_transformers import SentenceTransformer
    from mteb import MTEB

    logging.basicConfig(level=logging.INFO)
    logger = logging.getLogger("main")

   # https://huggingface.co/dangvantuan/sentence-camembert-large
   model_name = "dangvantuan/sentence-camembert-large"
   model = SentenceTransformer(model_name)

   TASK_LIST_STS = [
        "SummEvalFr",
        "STSBenchmarkMultilingualSTS",
        "STS22",
        "SICKFr"
    ]

    for task in TASK_LIST_STS:
      logger.info(f"Running task: {task}")
      evaluation = MTEB(tasks=[task], task_langs=["fr"]) 
      evaluation.run(model_name, output_folder=f"fr_results/{model_name}")

But I got this error:

Summarization

- SummEvalFr, p2p

ERROR:mteb.evaluation.MTEB:Error while evaluating SummEvalFr: 'batch_size' is an invalid keyword argument for encode()


TypeError Traceback (most recent call last)

in () 18 logger.info(f"Running task: {task}") 19 evaluation = MTEB(tasks=[task], task_langs=["fr"]) ---> 20 evaluation.run(model_name, output_folder=f"fr_results/{model_name}") 4 frames /usr/local/lib/python3.10/dist-packages/mteb/evaluation/evaluators/SummarizationEvaluator.py in __call__(self, model) 51 52 logger.info(f"Encoding {sum(human_lens)} human summaries...") ---> 53 embs_human_summaries_all = model.encode( 54 [summary for human_summaries in self.human_summaries for summary in human_summaries], 55 batch_size=self.batch_size, TypeError: 'batch_size' is an invalid keyword argument for encode()

What should I do ?

Muennighoff commented 6 months ago

In your code you do evaluation.run(model_name, output_folder=f"fr_results/{model_name}"), shouldn't it be evaluation.run(model, output_folder=f"fr_results/{model_name}")?

LeMoussel commented 6 months ago

You're right ! thanks for your help.