THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.15k stars 150 forks source link

Stuck when running webshop evaluation #32

Closed zwhe99 closed 1 year ago

zwhe99 commented 1 year ago

I follow https://github.com/THUDM/AgentBench/blob/main/docs/tutorial.md#how-to-run-all-tasks-in-agentbench to setup my env.

I ran webshop evaluation and it stuck:

Evaluating in docker localhost/task:webshop, Parameters: --task outputs/2023-09-01-22-06-37/Do-Nothing-Agent/WebShop-dev/task.yaml --agent outputs/2023-09-01-22-06-37/Do-Nothing-Agent/WebShop-dev/agent.yaml --output outputs/2023-09-01-22-06-37/Do-Nothing-Agent/WebShop-dev --workers 1
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
> [Warning] FastChat agent not available
{'module': 'src.tasks.WebShop', 'parameters': {'end': 280, 'name': 'WebShop-dev', 'num_envs': 3, 'start': 200, 'worker_limit': 3, 'workers': 1}}
{'module': 'src.agents.DoNothingAgent', 'parameters': {'name': 'Do-Nothing-Agent', 'sleep': 0.01}}
[Evaluation] Loading Agent ...
[Evaluation] Successfully loaded Agent.
[Evaluation] Loading Task ...
> [Warning] OSInteraction task not available
> [Warning] ALFWorld task not available
> [Warning] DBBench task not available
Warning: Gym version v0.24.0 has a number of critical issues with `gym.make` such that the `reset` and `step` functions are called before returning the environment. It is recommend to downgrading to v0.23.1 or upgrading to v0.25.1
/root/miniconda3/envs/webshop/lib/python3.8/site-packages/pkg_resources/__init__.py:121: DeprecationWarning: pkg_resources is deprecated as an API
  warnings.warn("pkg_resources is deprecated as an API", DeprecationWarning)
/root/miniconda3/envs/webshop/lib/python3.8/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('mpl_toolkits')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/root/miniconda3/envs/webshop/lib/python3.8/site-packages/faiss/loader.py:28: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(numpy.__version__) >= "1.19":
/root/miniconda3/envs/webshop/lib/python3.8/site-packages/setuptools/_distutils/version.py:345: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  other = LooseVersion(other)
/root/miniconda3/envs/webshop/lib/python3.8/site-packages/thefuzz/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
  warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
/root/miniconda3/envs/webshop/lib/python3.8/site-packages/gym/envs/registration.py:516: UserWarning: WARN: Overriding environment WebAgentSiteEnv-v0
  logger.warn(f"Overriding environment {spec.id}")
/root/miniconda3/envs/webshop/lib/python3.8/site-packages/gym/envs/registration.py:516: UserWarning: WARN: Overriding environment WebAgentTextEnv-v0
  logger.warn(f"Overriding environment {spec.id}")
> [Warning] Mind2Web task not available
> [Warning] KnowledgeGraph task not available
[Evaluation] Successfully loaded Task.
Evaluating task 'WebShop-dev' ...
Start Predicting All ...
  0%|                                                                                                                                                                                      | 0/80 [00:00<?, ?it/s]> [Warning] FastChat agent not available
> [Warning] OSInteraction task not available
> [Warning] ALFWorld task not available
> [Warning] DBBench task not available
Warning: Gym version v0.24.0 has a number of critical issues with `gym.make` such that the `reset` and `step` functions are called before returning the environment. It is recommend to downgrading to v0.23.1 or upgrading to v0.25.1
/root/miniconda3/envs/webshop/lib/python3.8/site-packages/pkg_resources/__init__.py:121: DeprecationWarning: pkg_resources is deprecated as an API
  warnings.warn("pkg_resources is deprecated as an API", DeprecationWarning)
/root/miniconda3/envs/webshop/lib/python3.8/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('mpl_toolkits')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/root/miniconda3/envs/webshop/lib/python3.8/site-packages/faiss/loader.py:28: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(numpy.__version__) >= "1.19":
/root/miniconda3/envs/webshop/lib/python3.8/site-packages/setuptools/_distutils/version.py:345: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  other = LooseVersion(other)
/root/miniconda3/envs/webshop/lib/python3.8/site-packages/thefuzz/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
  warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
/root/miniconda3/envs/webshop/lib/python3.8/site-packages/gym/envs/registration.py:516: UserWarning: WARN: Overriding environment WebAgentSiteEnv-v0
  logger.warn(f"Overriding environment {spec.id}")
/root/miniconda3/envs/webshop/lib/python3.8/site-packages/gym/envs/registration.py:516: UserWarning: WARN: Overriding environment WebAgentTextEnv-v0
  logger.warn(f"Overriding environment {spec.id}")
> [Warning] Mind2Web task not available
> [Warning] KnowledgeGraph task not available
zhc7 commented 1 year ago

Everything's normal till now. Check if memory usage is gently going up to about 15GB. Wait about 30 ~ 90 seconds. Give it some time to load data.