embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.68k stars 221 forks source link

New Model: OpenAI text embedding v3 #144

Open linghongli opened 11 months ago

Muennighoff commented 11 months ago

Yes; OpenAI text-embedding-ada-002 is on the leaderboard

akashAD98 commented 6 months ago

@Muennighoff how can i test new embedding v3 model? want to reprodcu results

Muennighoff commented 6 months ago

There are scripts for the previous openai embedding models here: https://github.com/embeddings-benchmark/mtebscripts

If you modify it for the v3, would be cool if you can add it via pr!

akashAD98 commented 6 months ago

@Muennighoff direct passing model name i think it will work.but im not able to run your default old version code may I know what kind of requirements.txt I need for this evaluation project? & which python version is best

error image

Muennighoff commented 6 months ago

Hmm I recommend you use Python >= 3.9 & maybe try upgrade mteb / datasets

akashAD98 commented 6 months ago

@Muennighoff using py 3.9.18 but for openai why im getting this error?

can you provide me some info regarding how to run it? in readme its not clear

INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
───────────────────────────────────────────────────────────────────────────────────────────── Selected tasks  ──────────────────────────────────────────────────────────────────────────────────────────────
Classification
    - AmazonCounterfactualClassification, s2s, multilingual 1 / 4 langs

INFO:mteb.evaluation.MTEB:

********************** Evaluating AmazonCounterfactualClassification **********************
INFO:mteb.evaluation.MTEB:Loading dataset for AmazonCounterfactualClassification
/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
  warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
ERROR:mteb.evaluation.MTEB:Error while evaluating AmazonCounterfactualClassification: Loading a dataset cached in a LocalFileSystem is not supported.
Traceback (most recent call last):
  File "/home/akash/LANCEDB/mtebscripts-main/run_array_openaiv2.py", line 226, in <module>
    main(args)
  File "/home/akash/LANCEDB/mtebscripts-main/run_array_openaiv2.py", line 222, in main
    evaluation.run(model, output_folder=f"results/{model_name}", batch_size=args.batchsize, eval_splits=eval_splits, corpus_chunk_size=10000)
  File "/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/mteb/evaluation/MTEB.py", line 289, in run
    raise e
  File "/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/mteb/evaluation/MTEB.py", line 261, in run
    task.load_data(eval_splits=task_eval_splits)
  File "/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/mteb/abstasks/MultilingualTask.py", line 25, in load_data
    self.dataset[lang] = datasets.load_dataset(
  File "/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/datasets/load.py", line 2149, in load_dataset
    ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory)
  File "/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/datasets/builder.py", line 1173, in as_dataset
    raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).__name__} is not supported.")
NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.
Muennighoff commented 6 months ago

This looks like an issue not with MTEB but datasets, maybe try https://stackoverflow.com/questions/77433096/notimplementederror-loading-a-dataset-cached-in-a-localfilesystem-is-not-suppor

akashAD98 commented 6 months ago

same error getting . i wanted to do it online mode, not offline