THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.03k stars 138 forks source link

[Assistance] Number of problems in the OS dataset #72

Open deema-A opened 8 months ago

deema-A commented 8 months ago

Hi, I have counted the number of data samples or problems in the 'os_interaction' folder, and my count shows a total of 191 samples. However, the table that provides statistics reports a different number of samples, specifically 170 samples. Not sure if I was looking at the correct folder. Appreciate your help. thanks!

Longin-Yu commented 8 months ago

Thank you for your interest.

  1. Could you provide the detail of the split whose count is wrong?
  2. File data/os_interaction/data/6-backup.json is deprecated and we don't contain it in our dataset. Details are shown in main/configs/tasks/os.yaml.
deema-A commented 8 months ago

thank you @Longin-Yu apart form the 6-backup the total is 182, including 26 dev. test 156 dev 26 but the stat here shows the test with 144 samples?

Screenshot 2023-11-09 at 7 58 18 AM