YifeiZhou02 / ArCHer

Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
https://yifeizhou02.github.io/archer.io/
84 stars 10 forks source link

WebShop Experiment #6

Closed symoon11 closed 4 months ago

symoon11 commented 5 months ago

Thank you for the great work! I have some questions regarding on the WebShop experiment.

  1. Does the policy trained on 100 environments transfer well to held-out environments?
  2. Have you attempted to train a policy on more than 100 environments? I understand that the speed of environment interaction might be slower, but I am curious about the results.

Thanks.

YifeiZhou02 commented 4 months ago

Thanks for your question. We actually did not try held-out environment, but we did try running on 1000 webshop environments. The agent indeed seems to improving. However, for some unknown reason of the webshop server, it seems to be extremely slow when there are 1000 possible instructions so we did not report it in the paper. Hope it helps!

symoon11 commented 4 months ago

Thanks for your reply! I resolved the speed issue yesterday. Currently, I am running PPO and your algorithm on 1000 environments and it seems there is still room for improvement. Can I email you to learn more about the WebShop experiment?

YifeiZhou02 commented 4 months ago

Of course.

moghis commented 3 months ago

Thanks for your reply! I resolved the speed issue yesterday. Currently, I am running PPO and your algorithm on 1000 environments and it seems there is still room for improvement. Can I email you to learn more about the WebShop experiment?

Hello @symoon11, Can you please share your adjusted code? I would like to know how you resolved the issue. Thank you very much.

symoon11 commented 3 months ago

Just comment out lines 237 to 250 in https://github.com/princeton-nlp/WebShop/blob/a557a208d03b93c83f4075e66e8746922606e60f/web_agent_site/app.py to speed up!