Open linghongli opened 11 months ago
@Muennighoff how can i test new embedding v3 model? want to reprodcu results
There are scripts for the previous openai embedding models here: https://github.com/embeddings-benchmark/mtebscripts
If you modify it for the v3, would be cool if you can add it via pr!
@Muennighoff direct passing model name i think it will work.but im not able to run your default old version code may I know what kind of requirements.txt I need for this evaluation project? & which python version is best
error
Hmm I recommend you use Python >= 3.9 & maybe try upgrade mteb / datasets
@Muennighoff using py 3.9.18 but for openai why im getting this error?
can you provide me some info regarding how to run it? in readme its not clear
INFO:mteb.evaluation.MTEB:
## Evaluating 1 tasks:
───────────────────────────────────────────────────────────────────────────────────────────── Selected tasks ──────────────────────────────────────────────────────────────────────────────────────────────
Classification
- AmazonCounterfactualClassification, s2s, multilingual 1 / 4 langs
INFO:mteb.evaluation.MTEB:
********************** Evaluating AmazonCounterfactualClassification **********************
INFO:mteb.evaluation.MTEB:Loading dataset for AmazonCounterfactualClassification
/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
ERROR:mteb.evaluation.MTEB:Error while evaluating AmazonCounterfactualClassification: Loading a dataset cached in a LocalFileSystem is not supported.
Traceback (most recent call last):
File "/home/akash/LANCEDB/mtebscripts-main/run_array_openaiv2.py", line 226, in <module>
main(args)
File "/home/akash/LANCEDB/mtebscripts-main/run_array_openaiv2.py", line 222, in main
evaluation.run(model, output_folder=f"results/{model_name}", batch_size=args.batchsize, eval_splits=eval_splits, corpus_chunk_size=10000)
File "/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/mteb/evaluation/MTEB.py", line 289, in run
raise e
File "/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/mteb/evaluation/MTEB.py", line 261, in run
task.load_data(eval_splits=task_eval_splits)
File "/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/mteb/abstasks/MultilingualTask.py", line 25, in load_data
self.dataset[lang] = datasets.load_dataset(
File "/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/datasets/load.py", line 2149, in load_dataset
ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory)
File "/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/datasets/builder.py", line 1173, in as_dataset
raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).__name__} is not supported.")
NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.
This looks like an issue not with MTEB but datasets, maybe try https://stackoverflow.com/questions/77433096/notimplementederror-loading-a-dataset-cached-in-a-localfilesystem-is-not-suppor
same error getting . i wanted to do it online mode, not offline
Yes; OpenAI text-embedding-ada-002 is on the leaderboard