THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.03k stars 138 forks source link

How do you deal with the cases when the input is longer than the context length? #48

Closed leoozy closed 8 months ago

leoozy commented 10 months ago

Hello, thank you for your code. I have some problems about your framework. In some tasks, such as webshop, the observation/history could be vary long, even longer than the context length of 4096. How do you deal with it? Thank you!

Longin-Yu commented 9 months ago

Thanks for you instrest. We have two operations for handling context limit.

  1. Omit earlier messages before inference. For take possibly more models into evaluation, we cut messages if too long, and keep the first and the latest ones. You can refer to this piece of code. After this step, tokens can be controlled within about 4k.
  2. Special judge for HTTPAgent. Because earlier models (e.g. openai davinci) support 2k tokens only, we add a special judge function for it.
Xiao9905 commented 8 months ago

@leoozy Hi, thanks for your interest in AgentBench! Has your problem been solved successfully? Please feel free to reopen the issue for help if you need it.