All-Hands-AI / OpenHands

🙌 OpenHands: Code Less, Make More
https://all-hands.dev
MIT License
31.31k stars 3.61k forks source link

[Bug]: Error when running AgentBench evaluation with CodeActAgent #3204

Closed jatinganhotra closed 1 week ago

jatinganhotra commented 1 month ago

Is there an existing issue for the same bug?

Describe the bug

From the paper section 4.4.3 AgentBench - We selected the code-grounded operating system (OS) subset with 144 tasks

I am trying to run the evaluation on OSBench subset of AgentBench using CodeActAgent, but when running

./evaluation/agent_bench/scripts/run_infer.sh llm_config agentbench_branch CodeActAgent

I get the error:

ERROR:root:<class 'requests.exceptions.ReadTimeout'>: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

Full trace:

ERROR:root:  File "/data/workspace/jatinganhotra/AgentBench_OpenDevin/evaluation/agent_bench/run_infer.py", line 253, in <module>
    run_evaluation(
  File "/data/workspace/jatinganhotra/AgentBench_OpenDevin/evaluation/utils/shared.py", line 194, in run_evaluation
    process_instance_func(instance, metadata)
  File "/data/workspace/jatinganhotra/AgentBench_OpenDevin/evaluation/agent_bench/run_infer.py", line 102, in process_instance
    sandbox = DockerSSHBox(
              ^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/AgentBench_OpenDevin/opendevin/runtime/docker/ssh_box.py", line 202, in __init__
    self.close()
  File "/data/workspace/jatinganhotra/AgentBench_OpenDevin/opendevin/runtime/docker/ssh_box.py", line 623, in close
    containers = self.docker_client.containers.list(all=True)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/poetry_cache/virtualenvs/opendevin-mH2JxuRx-py3.11/lib/python3.11/site-packages/docker/models/containers.py", line 1018, in list
    containers.append(self.get(r['Id']))
                      ^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/poetry_cache/virtualenvs/opendevin-mH2JxuRx-py3.11/lib/python3.11/site-packages/docker/models/containers.py", line 954, in get
    resp = self.client.api.inspect_container(container_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/poetry_cache/virtualenvs/opendevin-mH2JxuRx-py3.11/lib/python3.11/site-packages/docker/utils/decorators.py", line 19, in wrapped
    return f(self, resource_id, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/poetry_cache/virtualenvs/opendevin-mH2JxuRx-py3.11/lib/python3.11/site-packages/docker/api/container.py", line 794, in inspect_container
    self._get(self._url("/containers/{0}/json", container)), True
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/poetry_cache/virtualenvs/opendevin-mH2JxuRx-py3.11/lib/python3.11/site-packages/docker/utils/decorators.py", line 44, in inner
    return f(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/poetry_cache/virtualenvs/opendevin-mH2JxuRx-py3.11/lib/python3.11/site-packages/docker/api/client.py", line 246, in _get
    return self.get(url, **self._set_request_timeout(kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/poetry_cache/virtualenvs/opendevin-mH2JxuRx-py3.11/lib/python3.11/site-packages/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/poetry_cache/virtualenvs/opendevin-mH2JxuRx-py3.11/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/poetry_cache/virtualenvs/opendevin-mH2JxuRx-py3.11/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/poetry_cache/virtualenvs/opendevin-mH2JxuRx-py3.11/lib/python3.11/site-packages/requests/adapters.py", line 713, in send
    raise ReadTimeout(e, request=request)

ERROR:root:<class 'requests.exceptions.ReadTimeout'>: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
  0%|                                                                                                                                              | 0/144 [01:31<?, ?it/s]

I've followed reinstallation guide and also restarted the server.

Prior to server restart, I was getting this error - https://docs.all-hands.dev/modules/usage/troubleshooting#unable-to-connect-to-ssh-box

pexpect.pxssh.ExceptionPxssh: Could not establish connection to host

but now I am getting the above error. Both errors are related to DockerSSHBox, which is different from the specialized DockerSSHBox for SWEBenchSSHBox.

Note - SWE-Bench evaluation runs fine on this server.

Current OpenDevin version

`1.8` running on commit:

commit 84a6e90dc2e8b1e096af33f7545fa1969853c7d4 (origin/main, origin/HEAD, main)
Author: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
Date:   Tue Jul 30 00:40:33 2024 +0530

### Installation and Configuration

```bash
config.toml

[core]
workspace_base="./workspace"
max_iterations = 100
cache_dir = "/tmp/cache"
sandbox_container_image = "ghcr.io/opendevin/sandbox:latest"
# sandbox_type = "ssh"
ssh_hostname = "localhost"
sandbox_timeout = 120
run_as_devin = false

[sandbox]
use_host_network = true
box_type = "ssh"

# SWEBench eval specific
enable_auto_lint = true
max_budget_per_task = 4 # 4 USD

[llm.llm_config]
base_url="MODEL_URL"
model="MODEL_NAME"
api_key="pak-fake"
temperature = 0.0

### Model and Agent

- Model: Llama
- Agent: CodeActAgent

### Operating System

cat /etc/os-release NAME="Ubuntu" VERSION="20.04.6 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.6 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

Reproduction Steps

  1. Checkout OpenDevin
  2. Setup development env as stated here: Development.md
  3. Run evaluation command as provided above

Logs, Errors, Screenshots, and Additional Context

No response

xingyaoww commented 1 month ago

Sorry to hear that! How many processes you are using (N_PROCESS)? Maybe you can try to tune that down to 1 to see if the issue persists?

Here's the pointer i got from claude sonnet 3.5:

This error suggests that the Docker daemon is not responding within the expected time frame (60 seconds). This could be due to several reasons: The Docker daemon is overloaded or not responding properly. There might be network issues if Docker is running on a remote machine. The system running Docker might be under heavy load, causing slow responses. There could be a large number of containers or images, causing the listing operation to take too long.

You could also try check how many containers are running in your system docker ps -- if there's too many stale containers, you can try to stop them first to free up system resources (e.g., docker ps --format '{{.Names}}' | grep opendevin | xargs docker stop)

BTW, we are working on a new Runtime for eval (#2404) that completely gets rid of the SSHBox that can sometimes be unstable in the coming weeks - Hopefully this can work better for these evals.

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 1 week ago

This issue was closed because it has been stalled for over 30 days with no activity.