embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.63k stars 212 forks source link

FaithDial & StatcanDialogueDatasetRetrieval break with mteb models #1041

Closed Muennighoff closed 3 hours ago

Muennighoff commented 4 days ago

While working on https://github.com/embeddings-benchmark/mteb/pull/1038:

INFO:mteb.cli:Running with parameters: Namespace(model='GritLM/GritLM-7B', task_types=None, categories=None, tasks=['FaithDial'], languages=None, device=None, output_folder='/data/niklas/results', verbosity=2, co2_tracker=True, eval_splits=None, model_revision=None, batch_size=None, overwrite=False, func=<function run at 0x7f50ef6a7a30>)
/env/lib/conda/gritkto/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards:  33%|███▎      | 1/3 [00:11<00:22, 11.39s/it]
Loading checkpoint shards:  67%|██████▋   | 2/3 [00:23<00:12, 12.05s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:33<00:00, 10.98s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:33<00:00, 11.20s/it]
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
Created GritLM: torch.bfloat16 dtype, mean pool, embedding mode, bbcc attn
----------Using 8 data-parallel GPUs----------
─────────────────────────────── Selected tasks  ────────────────────────────────
Retrieval
    - FaithDial, s2p

INFO:mteb.evaluation.MTEB:

********************** Evaluating FaithDial **********************
INFO:mteb.evaluation.MTEB:Loading dataset for FaithDial
/env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/load.py:1491: FutureWarning: The repository for McGill-NLP/FaithDial contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/McGill-NLP/FaithDial
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
  warnings.warn(
INFO:mteb.evaluation.evaluators.RetrievalEvaluator:The custom encode_queries and encode_corpus functions of the model will be used
INFO:mteb.abstasks.AbsTaskRetrieval:Subset: default
INFO:mteb.evaluation.evaluators.RetrievalEvaluator:Encoding Queries.
ERROR:mteb.evaluation.MTEB:Error while evaluating FaithDial: 'GritLMWrapper' object has no attribute 'encode_conversations'
debug(xwinxu) cache_dir: None data_dir: None data_files: None features: None download_config: None download_mode: DownloadMode.REUSE_DATASET_IF_EXISTS revision: 7a414e80725eac766f2602676dc8b39f80b061e4 token: None storage_options: None trust_remote_code: None config_kwargs: {}
Traceback (most recent call last):
  File "/env/lib/conda/gritkto/bin/mteb", line 8, in <module>
    sys.exit(main())
  File "/data/niklas/mteb/mteb/cli.py", line 381, in main
    args.func(args)
  File "/data/niklas/mteb/mteb/cli.py", line 122, in run
    eval.run(
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 421, in run
    raise e
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 369, in run
    results, tick, tock = self._run_eval(
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 259, in _run_eval
    results = task.evaluate(
  File "/data/niklas/mteb/mteb/abstasks/AbsTaskRetrieval.py", line 286, in evaluate
    scores[hf_subset] = self._evaluate_subset(
  File "/data/niklas/mteb/mteb/abstasks/AbsTaskRetrieval.py", line 295, in _evaluate_subset
    results = retriever(corpus, queries)
  File "/data/niklas/mteb/mteb/evaluation/evaluators/RetrievalEvaluator.py", line 512, in __call__
    return self.retriever.search(
  File "/data/niklas/mteb/mteb/evaluation/evaluators/RetrievalEvaluator.py", line 102, in search
    query_embeddings = self.model.encode_conversations(
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'GritLMWrapper' object has no attribute 'encode_conversations'
KennethEnevoldsen commented 4 days ago

@vaibhavad we should probably add some documentation on encode_conversations