Evaluation of MMLU tasks using the OpenAI API

Laplace888 commented 1 month ago

"Hello, I'm trying to evaluate the GPT-4o model using the MMLU dataset, but I'm encountering an error. Could you advise me on how to proceed?"

"This is the command I used:

lm_eval --model openai-chat-completions \ --model_args model=gpt-4o \ --tasks mmlu_anatomy \ --apply_chat_template \ --output_path ./result \ --log_samples \ --show_config \ --limit 10

Is there anything I should add or change?" Let me know if you need any more adjustments!

eyuansu62 commented 1 month ago

whats your error?

Laplace888 commented 1 month ago

Traceback (most recent call last): File "C:\Users\KT\miniconda3\envs\gpt_test\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\KT\miniconda3\envs\gpt_test\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\KT\miniconda3\envs\gpt_test\Scripts\lm_eval.exe__main.py", line 7, in File "C:\Users\KT\Desktop\conda_GPT\lm-evaluation-harness\lm_eval\main__.py", line 382, in cli_evaluate results = evaluator.simple_evaluate( File "C:\Users\KT\Desktop\conda_GPT\lm-evaluation-harness\lm_eval\utils.py", line 397, in _wrapper return fn(*args, *kwargs) File "C:\Users\KT\Desktop\conda_GPT\lm-evaluation-harness\lm_eval\evaluator.py", line 301, in simple_evaluate results = evaluate( File "C:\Users\KT\Desktop\conda_GPT\lm-evaluation-harness\lm_eval\utils.py", line 397, in _wrapper return fn(args, **kwargs) File "C:\Users\KT\Desktop\conda_GPT\lm-evaluation-harness\lm_eval\evaluator.py", line 476, in evaluate resps = getattr(lm, reqtype)(cloned_reqs) File "C:\Users\KT\Desktop\conda_GPT\lm-evaluation-harness\lm_eval\models\openai_completions.py", line 168, in loglikelihood raise NotImplementedError( NotImplementedError: Loglikelihood is not supported for chat completions. Consider using the completions API instead.

eyuansu62 commented 1 month ago

here are the solution: "NotImplementedError: Loglikelihood is not supported for chat completions. Consider using the completions API instead."

EleutherAI / lm-evaluation-harness

Evaluation of MMLU tasks using the OpenAI API #2318