THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.01k stars 136 forks source link

Fix Execution Permission Issue and Adjust LTP Task Rounds #123

Closed Taishi-N324 closed 4 months ago

Taishi-N324 commented 4 months ago

Execution Permission Issue for the Card Game Task: During debugging, it was discovered that the card_game task was hanging due to the lack of execution permissions for src/server/tasks/card_game/logic/bin/main. This issue was preventing the execution of the try block within judger.py, leading to a failure in task initialization and subsequent hanging of the task. To resolve this, I have applied the git update-index --chmod=+x src/server/tasks/card_game/logic/bin/maincommand to grant execution permissions to the necessary file.

Incorrect Rounds Configuration for the LTP Task: It was identified that there was a discrepancy in the rounds specified for the LTP task. The configuration in ltp.yaml specified 25 rounds, but the implementation in task.py erroneously used 50 rounds. This mismatch led to errors due to longer sequence lengths than expected. I have adjusted the rounds to the correct number of 25 to align with the configuration and likely the evaluation settings discussed in the project's associated paper.

matinaghaei commented 1 week ago

@Taishi-N324 Hi, it seems that the execution permission issue is not addressed in this commit (At least I can't see in the diffrences). Can you tell us exactly what needs to change in the code? Where should git update-index --chmod=+x src/server/tasks/card_game/logic/bin/main be applied? Thanks!

Taishi-N324 commented 1 week ago

https://github.com/THUDM/AgentBench/pull/123/files

Screenshot 2024-06-27 at 2 37 06

@matinaghaei