embeddings-benchmark / mtebpaper

Resources & scripts for the paper "MTEB: Massive Text Embedding Benchmark"
15 stars 3 forks source link

Error while evaluating STS22: Empty items should be cleaned prior to running #1

Closed afurkank closed 1 year ago

afurkank commented 1 year ago

Hi,

I'm trying to run "run_array_openai.py" script to evaluate OpenAI's embedding model "text-embedding-ada-002" on the dataset STS22 in the Turkish language.

The changes I made to the original script are:

TASK_LIST = TASK_LIST_STS

- Modified the function "parse_args()" like this:

def parse_args():

Parse command line arguments

parser = argparse.ArgumentParser()
parser.add_argument("--startid", type=int)
parser.add_argument("--endid", type=int)
parser.add_argument("--engine", type=str, default="text-embedding-ada-002")
parser.add_argument("--lang", type=str, default="tr")
parser.add_argument("--taskname", type=str, default=None)
parser.add_argument("--batchsize", type=int, default=2048)
args = parser.parse_args()
return args

Here is my output which also includes the error:

PS C:\Users\furkan> & C:/Users/furkan/anaconda3/python.exe "c:/Users/furkan/Desktop/Embedding Comparison/run_array_openai.py" INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. INFO:numexpr.utils:NumExpr defaulting to 8 threads. Running task: STS22 INFO:mteb.evaluation.MTEB:

Evaluating 1 tasks:

──────────────────────────────────────────────────────────────────────────────────────────────── Selected tasks ──────────────────────────────────────────────────────────────────────────────────────────────── STS

INFO:mteb.evaluation.MTEB:

** Evaluating STS22 ** INFO:mteb.evaluation.MTEB:Loading dataset for STS22 Downloading and preparing dataset sts22-crosslingual-sts/tr to C:/Users/furkan/.cache/huggingface/datasets/mtebsts22-crosslingual-sts/tr/1.0.0/563d7d9067b4162f5e964eb988aaa492b59e7ed47a03f16ec94e19b0e60ee8c1... Dataset sts22-crosslingual-sts downloaded and prepared to C:/Users/furkan/.cache/huggingface/datasets/mtebsts22-crosslingual-sts/tr/1.0.0/563d7d9067b4162f5e964eb988aaa492b59e7ed47a03f16ec94e19b0e60ee8c1. Subsequent calls will reuse this data.

Task: STS22, split: test, language: tr. Running... INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 211 sentences1... Token indices sequence length is longer than the specified maximum sequence length for this model (1084 > 1024). Running this sequence through the model will result in indexing errors INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 211 sentences2... ERROR:mteb.evaluation.MTEB:Error while evaluating STS22: Empty items should be cleaned prior to running ERROR:mteb.evaluation.MTEB:Please check all the error logs at: error_logs.txt PS C:\Users\furkan>



Is there anything I can do to fix this?
Muennighoff commented 1 year ago

Can you try using https://github.com/embeddings-benchmark/mtebscripts/blob/main/run_array_openaiv2.py i.e. the v2 version? with

openai==0.26.4
tiktoken==0.2.0

It automatically replaces empty strings with an empty space which is a hacky way to get around the issue of empty items. Also if you get the results feel free to share & I can upload them to the leaderboard :)

afurkank commented 1 year ago

Thanks for the quick response.

I tried "run_array_openaiv2.py" and did the same changes to the script.

Here is the output:

PS C:\Users\furkan> & C:/Users/furkan/anaconda3/python.exe "c:/Users/furkan/Desktop/Embedding Comparison/run_array_openaiv2.py"
Running task:  STS22
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Selected tasks  ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────STS
    - STS22, p2p, crosslingual 18 pairs

INFO:mteb.evaluation.MTEB:

********************** Evaluating STS22 **********************
INFO:mteb.evaluation.MTEB:Loading dataset for STS22

Task: STS22, split: test, language: tr. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 211 sentences1...
ERROR:mteb.evaluation.MTEB:Error while evaluating STS22: [Errno 2] No such file or directory: 'embeddings/text-embedding-ada-002//STS22_Sizlere da_/ 3. Sayfa.pickle'
ERROR:mteb.evaluation.MTEB:Please check all the error logs at: error_logs.txt

I don't really understand the meaning of this error.

Muennighoff commented 1 year ago

Sorry you can fix it by changing model = OpenAIEmbedder(args.engine, task_name=task, batch_size=args.batchsize, save_emb=True) to model = OpenAIEmbedder(args.engine, task_name=task, batch_size=args.batchsize, save_emb=False)

afurkank commented 1 year ago

Yes, that fixed the problem. However, there is another problem now.

I'm giving the output as-is:

PS C:\Users\furkan> & C:/Users/furkan/anaconda3/python.exe "c:/Users/furkan/Desktop/Embedding Comparison/run_array_openaiv2.py"
Running task:  STS22
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Selected tasks  ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────STS
    - STS22, p2p, crosslingual 18 pairs

INFO:mteb.evaluation.MTEB:

********************** Evaluating STS22 **********************
INFO:mteb.evaluation.MTEB:Loading dataset for STS22

Task: STS22, split: test, language: tr. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 211 sentences1...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 211 sentences2...
Detected empty item, which is not allowed by the OpenAI API - Replacing with empty space
INFO:openai:error_code=None error_message="'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference." error_param=None error_type=invalid_request_error message='OpenAI API error received' stream_error=False
ERROR:mteb.evaluation.MTEB:Error while evaluating STS22: '$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.
ERROR:mteb.evaluation.MTEB:Please check all the error logs at: error_logs.txt

Is there something wrong with the way the script is calling the model to embed the sentences? I looked to see something similar to the error message where the openai.Embedding is called but didn't see anything wrong with it.

Muennighoff commented 1 year ago

Hmm can you print out the inputs to see what input it's failing at? also make sure u have the right env versions (https://github.com/embeddings-benchmark/mtebscripts/issues/1#issuecomment-1636731990)

afurkank commented 1 year ago

So I did what you asked and the first sentences seem okay. However, among the second sentences, the 56th one is empty.

It is like this:

{"id": "...", "score": 3.0, "sentence1": "...", "sentence2": ""}

I thought the script was handling the empty sentences, is there something I should change with this sentence?

Muennighoff commented 1 year ago

Just made a fix to the script (https://github.com/embeddings-benchmark/mtebscripts/blob/main/run_array_openaiv2.py); can you try again?

afurkank commented 1 year ago

It's working now.

The output is as follows:

INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for STS22 on test took 129.21 seconds
INFO:mteb.evaluation.MTEB:Scores: {'tr': {'cos_sim': {'pearson': 0.563986094925504, 'spearman': 0.64498311459473}, 'manhattan': {'pearson': 0.6068273748824454, 'spearman': 0.6441179836604591}, 'euclidean': {'pearson': 0.6084112755525339, 'spearman': 0.64498311459473}}, 'evaluation_time': 129.21}

Oh and by the way, I had to change the save_emb to false again. Just letting you know.

Thanks a lot for the help :) Great project!

Muennighoff commented 1 year ago

Amazing, sorry for the problems! The scores are now also on the leaderboard, seems like it's the best model on STS22 tr! If you run more datasets, feel free to add the results here / let me know :)