THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.15k stars 150 forks source link

how to run the webshop task #41

Closed Z-ZHHH closed 1 year ago

Z-ZHHH commented 1 year ago

I want to run the webshop task, and I have run the following cmds

pip install --upgrade pip
pip install -r requirements.txt
bash scripts/build_docker.sh

However, there are still some third-party libraries not installed, i.e., faiss. In tutorial, it seems not mentioned. Have I missed sth.?

zhc7 commented 1 year ago

Hi, we recommend you to use our prebuilt image. Setting up an environment for task can be frustrating and boring. If you insist to build on your own, you should refer to src/tasks/webshop/README.md.

Z-ZHHH commented 1 year ago

Thanks for your reply! I have run all the installation cmds in the tutorial. Now I activate my conda environment. Is the step as follows?

  1. python create_assignment.py --assignment configs/assignments/my_example.yaml
  2. bash .assignment/xxxx.sh

Should I start the docker and run the cmds in the docker? I am kind of confused. Wish for you reply, thx!

Here is my my_example.yaml:

default:
    agent: configs/agents/api_agents/llama2-7b.yaml
    task:
        parameters:
        workers: 15
assignments:
    - task:
        from: "configs/tasks/webshop/dev.yaml"
        parameters:
            workers: 6
    - task: "configs/tasks/card_game/dev.yaml"
lzwqjh commented 1 year ago

Hi, I meet the similar problem, I run:

sudo docker run -it localhost/task:webshop 

to run the webshop image, but I don't know how to use it like os_interaction or dbbench. For example, run the following command:

 python eval.py     --task configs/tasks/webshop/dev.yaml     --agent configs/agents/do_nothing.yaml     --workers 30

I am kind of confused. Wish for you reply, thx!

zhc7 commented 1 year ago

You should execute the script outside. Actually if you look into the script, you'll find out that what it acutally does is starting a docker container to execute the acutall evaluation command.

zhc7 commented 1 year ago

@lzwqjh If you just want to run it, I recommend you to use create_assignment.py. If you want to know its principle, I suggest you take a look at script/eval_utils.sh.

Z-ZHHH commented 1 year ago

Thanks a lot. The error is caused by the fastchat in the docker, which needs accelerate etc.