THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.03k stars 138 forks source link

Running in Colab #51

Closed olivarb closed 8 months ago

olivarb commented 9 months ago

Is there any support to run this in Colab?

Longin-Yu commented 9 months ago

That may be difficult because this project is not a single-process program. But if you find a way to deploy the task servers, the evaluation process can be executed in Colab.

olivarb commented 9 months ago

@Longin-Yu Would there be any support for this in the future? It would make it easier to profile models

olivarb commented 8 months ago

@zhc7 Would it be possible to run without Docker? This would likely make it viable to run in Colab

zhc7 commented 8 months ago

Hi, running different tasks in the same environment can be really complicated. Requirements of different tasks may conflict with each other. If you insist, you may remove the docker field in configs and try to setup the environment manually. I haven't tried it myself, so I don't know if it's feasible.

olivarb commented 8 months ago

@zhc7 Will there be any official support for this use case? It is easy to spin up multiple colab sessions

zhc7 commented 8 months ago

Thanks for your advice. But this is a rather complicated problem and we haven't planned to support this recently. If you managed to solve this problem, we're always open to contributions.

dsubunow commented 2 months ago

Hi, is it possible to get a requirements.txt file for individual tasks? Or a dockerfile (as opposed to a docker image)? @zhc7 @Longin-Yu

MarcCote commented 2 months ago

@zhc7 @Longin-Yu any update on this.