I want to evaluate some french embeddings models using MTEB Semantic Text Similarity (STS) task.
To do this, I took inspiration from this code run_mteb_french.py
import logging
from sentence_transformers import SentenceTransformer
from mteb import MTEB
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("main")
# https://huggingface.co/dangvantuan/sentence-camembert-large
model_name = "dangvantuan/sentence-camembert-large"
model = SentenceTransformer(model_name)
TASK_LIST_STS = [
"SummEvalFr",
"STSBenchmarkMultilingualSTS",
"STS22",
"SICKFr"
]
for task in TASK_LIST_STS:
logger.info(f"Running task: {task}")
evaluation = MTEB(tasks=[task], task_langs=["fr"])
evaluation.run(model_name, output_folder=f"fr_results/{model_name}")
But I got this error:
Summarization
- SummEvalFr, p2p
ERROR:mteb.evaluation.MTEB:Error while evaluating SummEvalFr:
'batch_size' is an invalid keyword argument for encode()
TypeError Traceback (most recent call
last)
in ()
18 logger.info(f"Running task: {task}")
19 evaluation = MTEB(tasks=[task], task_langs=["fr"])
---> 20 evaluation.run(model_name, output_folder=f"fr_results/{model_name}")
4 frames
/usr/local/lib/python3.10/dist-packages/mteb/evaluation/evaluators/SummarizationEvaluator.py
in __call__(self, model)
51
52 logger.info(f"Encoding {sum(human_lens)} human summaries...")
---> 53 embs_human_summaries_all = model.encode(
54 [summary for human_summaries in self.human_summaries for summary in human_summaries],
55 batch_size=self.batch_size,
TypeError: 'batch_size' is an invalid keyword argument for encode()
In your code you do evaluation.run(model_name, output_folder=f"fr_results/{model_name}"), shouldn't it be evaluation.run(model, output_folder=f"fr_results/{model_name}")?
I want to evaluate some french embeddings models using MTEB Semantic Text Similarity (STS) task. To do this, I took inspiration from this code run_mteb_french.py
But I got this error:
What should I do ?