Open shuyanzhou opened 3 months ago
Hi, We directly adopt evaluation code from AgentTuning :)
Thank you for the response, but I am wondering if you perform multi-turn prompting to get one action?
During inference, we directly adopt the JSON format output or any format requested in the system prompt. The chat format data is used for training only.
Thank you very much for the info. We attempted to reproduce the result with the default prompt, but the SR is only 0.61%. Would you mind sharing the recorded trajectories so that we can compare what may go wrong from our end.
Hello, our project was evaluated in January 2024, and you might need to switch to an earlier official version https://github.com/web-arena-x/webarena/commit/14f91d90e60d79e829396d6429fc5e24de6c3fda. The website's Docker we used was downloaded from the official address https://github.com/web-arena-x/webarena/tree/main/environment_docker#wikipedia-website. And sorry that our task machines were recycled after the project was completed, which resulted in the loss of the log files.
Hi,
Thanks for the great work. I am wondering if you have plans to release the code to run WebArena?