-
I am using the GPT-4o model with the openai-chat-completions API. While evaluating various datasets in a task, I encountered an error when the output_type is set to multiple_choice.
I tried using t…
-
The results mention the scores of GPT-3.5 but I don't see how I can evaluate GPT using the code as it doesn't have that model.
-
"Hello, I'm trying to evaluate the GPT-4o model using the MMLU dataset, but I'm encountering an error. Could you advise me on how to proceed?"
"This is the command I used:
lm_eval --model openai…
-
Hi,
I want to evaluate gpt4o and other gpt related models on SWE bench dataset. Can someone guide by giving what steps to follow?
I have azure openai access key but it is not getting configured. So,…
-
**Summary**
Right now we don't know which LLMs work the best with OpenHands. It'd be good if we could do an evaluation to better understand this.
**Technical Design**
We will want to test popular L…
-
Hi, thanks again for releasing the interesting dataset!
Could you please also share the raw outputs from LLM/VQA models, which are the input to the GPT-4o answer extraction model?
Since the GPT-…
j-min updated
2 weeks ago
-
Hi,
I have followed the command `bash scripts/adaptpruning/llama_2_7b_alpaca_gpt4.sh` with the alpaca_gpt data downloaded from `https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/da…
-
In addition to constraint **‘example’,** is only gpt4 used for the evaluation of other constraint_type? Or are other models evaluated using both rule_based and gpt?
想请问一下这里面除了example以外的其他constraint…
-
From the [OpenAI o1 System Card](https://assets.ctfassets.net/kftzwdyauwt9/67qJD51Aur3eIc96iOfeOP/71551c3d223cd97e591aa89567306912/o1_system_card.pdf):
>"we translated MMLU’s[39] test set into 14 l…
-
### Required prerequisites
- [X] I have read the documentation .
- [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-…