THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.15k stars 150 forks source link

What temperature and max_new_tokens should be used? #20

Closed tju01 closed 1 year ago

tju01 commented 1 year ago

I am trying to make AgentBench work with some other models. However, it's not clear to me what temperature should be used for the agents. I can see that the fastchat agents use a temperature of 0:

https://github.com/THUDM/AgentBench/blob/d7dd9aefd28c40a1b4562dbd6f6e659a81cb7a94/configs/agents/fastchat_client.yaml#L7

However, any other agent like OpenAI agents don't seem to set the temperature, so it would just be the default of 1:

https://github.com/THUDM/AgentBench/blob/d7dd9aefd28c40a1b4562dbd6f6e659a81cb7a94/configs/agents/api_agents/gpt-3.5-turbo.yaml#L4

I saw that in your paper you wrote that you used a temperature of 0 for all tasks, but I can't actually find this in your code.

The same is true for the max_new_tokens which seems to be set to 128 for the fastchat models while no value is specified for the OpenAI chat models. A value seems to be specified for some other models, but it is 256 and not 128 which confuses me.

Longin-Yu commented 1 year ago

Sorry for the confusion. When we performed evaluation, we started a local server for all api-based models, and took src.agents.HTTPAgent as client. src.agents.api_agents were created for convenience just before we released the code.

tju01 commented 1 year ago

I see. I understand that the temperature should be 0 then. What would be the correct value for max_new_tokens? 128, 256 or something else?

zhc7 commented 1 year ago

card game 512. else 128.