Closed Laplace888 closed 1 month ago
Hey @Laplace888. multiple-choice tasks depend on logprobs and OpenAI does not provide those for any of the newer models (only davinci-002
iirc). I'll add a more detailed error message
Hey @baberabb , I am facing the same issue. But OpenAI provides logprobs: https://platform.openai.com/docs/api-reference/chat/create#:~:text=the%20relevant%20token.-,logprobs,-boolean%20or%20null
Can you please elaborate on how it is not possible to evaluate GPT-4o on MCQ questions using eval harness?
Thanks.
Hey @baberabb , I am facing the same issue. But OpenAI provides logprobs: https://platform.openai.com/docs/api-reference/chat/create#:~:text=the%20relevant%20token.-,logprobs,-boolean%20or%20null
Can you please elaborate on how it is not possible to evaluate GPT-4o on MCQ questions using eval harness?
Thanks.
Sorry, should have said prompt logprobs. For multiple-choice we are only concerned with the logprobs of the inputs. See https://github.com/EleutherAI/lm-evaluation-harness/issues/942#issuecomment-1777836312
I am using the GPT-4o model with the openai-chat-completions API. While evaluating various datasets in a task, I encountered an error when the output_type is set to multiple_choice.
I tried using the openai-completions API to resolve the issue, but it appears that GPT-4o is not supported there.
Is there any other solution to this problem?
Error when using openai-chat-completions
Error when using openai-completions
Error when using openai-completions without --apply_chat_template