gpt-evaluation Search Results

EleutherAI/lm-evaluation-harness #2326

Support for Using Multiple Choice Datasets with GPT-4o Model…

I am using the GPT-4o model with the openai-chat-completions API. While evaluating various datasets in a task, I encountered an error when the output_type is set to multiple_choice. I tried using t…

Laplace888 updated 2 days ago

THUDM/LongBench #69

Code for evaluation with GPT-3.5?

The results mention the scores of GPT-3.5 but I don't see how I can evaluate GPT using the code as it doesn't have that model.

RuskinManku updated 2 months ago

EleutherAI/lm-evaluation-harness #2318

Evaluation of MMLU tasks using the OpenAI API

"Hello, I'm trying to evaluate the GPT-4o model using the MMLU dataset, but I'm encountering an error. Could you advise me on how to proceed?" "This is the command I used: lm_eval --model openai…

Laplace888 updated 2 days ago

EleutherAI/lm-evaluation-harness #2302

Configuring Azure OPENAI

Hi, I want to evaluate gpt4o and other gpt related models on SWE bench dataset. Can someone guide by giving what steps to follow? I have azure openai access key but it is not getting configured. So,…

sudhanshu-myl updated 3 days ago

All-Hands-AI/OpenHands #3737

[Evaluation]: Evaluate various LLMs with OpenHands

**Summary** Right now we don't know which LLMs work the best with OpenHands. It'd be good if we could do an evaluation to better understand this. **Technical Design** We will want to test popular L…

neubig updated 1 week ago

mayubo2333/MMLongBench-Doc #1

Request for raw model predictions for reproducibility.

Hi, thanks again for releasing the interesting dataset! Could you please also share the raw outputs from LLM/VQA models, which are the input to the GPT-4o answer extraction model? Since the GPT-…

j-min updated 2 weeks ago

ROIM1998/APT #2

How to get the results shown in Table 3?

Hi, I have followed the command `bash scripts/adaptpruning/llama_2_7b_alpaca_gpt4.sh` with the alpaca_gpt data downloaded from `https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/da…

au-revoir updated 1 month ago

YJiangcm/FollowBench #10

about evaluation

In addition to constraint **‘example’,** is only gpt4 used for the evaluation of other constraint_type? Or are other models evaluated using both rule_based and gpt? 想请问一下这里面除了example以外的其他constraint…

Violettttee updated 2 weeks ago

EleutherAI/lm-evaluation-harness #2305

New Task: `mmlu` professionaly translated by OpenAI as part …

From the [OpenAI o1 System Card](https://assets.ctfassets.net/kftzwdyauwt9/67qJD51Aur3eIc96iOfeOP/71551c3d223cd97e591aa89567306912/o1_system_card.pdf): >"we translated MMLU’s[39] test set into 14 l…

giuliolovisotto updated 5 days ago

PKU-Alignment/safe-rlhf #161

[Question] GPT-4 and Human Evaluation

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-…

gao-xiao-bai updated 3 months ago

1000+ results for gpt-evaluation

1000+ results
for gpt-evaluation