THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.03k stars 138 forks source link

webshop stuck at 78/80 #49

Closed harshraj172 closed 8 months ago

harshraj172 commented 9 months ago

Warning: 4 messages are omitted. Warning: 4 messages are omitted. 98%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 78/80 [26:04<00:16, 8.11s/it]

Webshop evaluation is stuck at 78/80 iteration, its been 2hrs and it is not proceeding. Any help is deeply appreciated. -Thanks

Longin-Yu commented 9 months ago

We fix this in v0.2 (add timeout + resume evaluation), welcome to have a try!

Xiao9905 commented 8 months ago

@harshraj172 Hi, how is it going with your testing on webshop? Please feel free to reopen the issue if you need our help.