[Bug]: SWE-Bench inference - Failed to establish a new connection: [Errno 111] Connection refused

jatinganhotra commented 1 month ago

Is there an existing issue for the same bug?

[X] I have checked the troubleshooting document at https://docs.all-hands.dev/modules/usage/troubleshooting
[X] I have checked the existing issues.

Describe the bug

Hi team,

When I am trying to run inference for SWE-Bench Lite with > 1 worker, I am getting the following error. The inference runs OK with only 1 worker, which is the default value.

./evaluation/swe_bench/scripts/run_infer.sh MODEL_CONFIG with the default CodeActAgent

I'm getting the following error

Instance django__django-10914 - 2024-10-07 15:21:14,902 - ERROR - Error during action execution: HTTPConnectionPool(host='localhost', port=34090): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdfffde2390>: Failed to establish a new connection: [Errno 111] Connection refused'))
Instance astropy__astropy-12907 - 2024-10-07 15:21:19,293 - ERROR - Error during action execution: HTTPConnectionPool(host='localhost', port=30607): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdfffb425d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Instance astropy__astropy-14365 - 2024-10-07 15:21:24,839 - ERROR - Error during action execution: HTTPConnectionPool(host='localhost', port=32191): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdfffda1610>: Failed to establish a new connection: [Errno 111] Connection refused'))
Instance astropy__astropy-7746 - 2024-10-07 15:21:25,875 - ERROR - Error during action execution: HTTPConnectionPool(host='localhost', port=37017): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdfffd07510>: Failed to establish a new connection: [Errno 111] Conn

Stack trace:

----------[The above error occurred. Retrying... (attempt 3 of 5)]----------

Instance django__django-11001 - 2024-10-07 15:16:53,257 - WARNING - Action, ErrorObservation loop detected
Instance django__django-11001 - 2024-10-07 15:16:53,259 - ERROR - Error during action execution: HTTPConnectionPool(host='localhost', port=38197): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdfb94fcd10>: Failed to establish a new connection: [Errno 111] Connection refused'))
Instance django__django-11001 - 2024-10-07 15:16:53,261 - ERROR - Error during action execution: HTTPConnectionPool(host='localhost', port=38197): Max retries exceeded with url: /execute_action (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdfffcdc610>: Failed to establish a new connection: [Errno 111] Connection refused'))
Instance django__django-11001 - 2024-10-07 15:16:53,261 - ERROR - ----------
Error in instance [django__django-11001]: 'ErrorObservation' object has no attribute 'exit_code'. Stacktrace:
Traceback (most recent call last):
  File "/data/workspace/jatinganhotra/OpenDevin/evaluation/utils/shared.py", line 268, in _process_instance_wrapper
    result = process_instance_func(instance, metadata, use_mp)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/OpenDevin/evaluation/swe_bench/run_infer.py", line 367, in process_instance
    return_val = complete_runtime(runtime, instance)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/OpenDevin/evaluation/swe_bench/run_infer.py", line 287, in complete_runtime
    assert obs.exit_code == 0
           ^^^^^^^^^^^^^
AttributeError: 'ErrorObservation' object has no attribute 'exit_code'

----------[The above error occurred. Retrying... (attempt 3 of 5)]----------

----------
Error in instance [django__django-11001]: 'ErrorObservation' object has no attribute 'exit_code'. Stacktrace:
Traceback (most recent call last):
  File "/data/workspace/jatinganhotra/OpenDevin/evaluation/utils/shared.py", line 268, in _process_instance_wrapper
    result = process_instance_func(instance, metadata, use_mp)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/OpenDevin/evaluation/swe_bench/run_infer.py", line 367, in process_instance
    return_val = complete_runtime(runtime, instance)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/workspace/jatinganhotra/OpenDevin/evaluation/swe_bench/run_infer.py", line 287, in complete_runtime
    assert obs.exit_code == 0
           ^^^^^^^^^^^^^
AttributeError: 'ErrorObservation' object has no attribute 'exit_code'

STDOUT logs at the beginning

Number of workers not specified, use default 16
Commit hash not specified, use current git commit
Agent not specified, use default CodeActAgent
MAX_ITER not specified, use default 30
USE_INSTANCE_IMAGE not specified, use default true
DATASET not specified, use default princeton-nlp/SWE-bench_Lite
SPLIT not specified, use default test
USE_INSTANCE_IMAGE: true
AGENT: CodeActAgent
AGENT_VERSION: v1.9
MODEL_CONFIG: eval_vllm_vela_mistral_large_2
DATASET: princeton-nlp/SWE-bench_Lite
SPLIT: test
USE_HINT_TEXT: false
EVAL_NOTE: v1.9-no-hint
14:48:49 - openhands:INFO: run_infer.py:93 - Using docker image prefix: docker.io/xingyaoww/
14:48:56 - openhands:INFO: run_infer.py:441 - Loaded dataset princeton-nlp/SWE-bench_Lite with split test
14:48:56 - openhands:INFO: utils.py:258 - Loading llm config from eval_vllm_vela_mistral_large_2
14:48:56 - openhands:INFO: shared.py:165 - Using evaluation output directory: evaluation/evaluation_outputs/outputs/swe-bench-lite/CodeActAgent/mistral-large-instruct-2407_maxiter_30_N_v1.9-no-hint
14:48:56 - openhands:INFO: shared.py:181 - Metadata: {"agent_class": "CodeActAgent", "llm_config": {"model": "openai/mistral-large-instruct-2407", "api_key": "******", "base_url": "BASE_URL", "api_version": null, "embedding_model": "local", "embedding_base_url": null, "embedding_deployment_name": null, "aws_access_key_id": null, "aws_secret_access_key": null, "aws_region_name": null, "openrouter_site_url": "https://docs.all-hands.dev/", "openrouter_app_name": "OpenHands", "num_retries": 8, "retry_multiplier": 2, "retry_min_wait": 15, "retry_max_wait": 120, "timeout": null, "max_message_chars": 10000, "temperature": 0.0, "top_p": 1.0, "custom_llm_provider": null, "max_input_tokens": null, "max_output_tokens": null, "input_cost_per_token": null, "output_cost_per_token": null, "ollama_base_url": null, "drop_params": true, "disable_vision": null, "caching_prompt": true, "log_completions": false}, "max_iterations": 30, "eval_output_dir": "evaluation/evaluation_outputs/outputs/swe-bench-lite/CodeActAgent/mistral-large-instruct-2407_maxiter_30_N_v1.9-no-hint", "start_time": "2024-10-07 14:48:56", "git_commit": "dd228c07e05b6908bc1d15dde8f8025284a9ef47", "dataset": "swe-bench-lite", "data_split": null, "details": {}}
14:48:56 - openhands:INFO: shared.py:199 - Writing evaluation output to evaluation/evaluation_outputs/outputs/swe-bench-lite/CodeActAgent/mistral-large-instruct-2407_maxiter_30_N_v1.9-no-hint/output.jsonl
14:48:56 - openhands:INFO: shared.py:232 - Finished instances: 0, Remaining instances: 300

Current OpenHands version

Commit - dd228c07e05b6908bc1d15dde8f8025284a9ef47

Installation and Configuration

> ./evaluation/swe_bench/scripts/run_infer.sh MODEL_CONFIG
Number of workers not specified, use default 16
Commit hash not specified, use current git commit
Agent not specified, use default CodeActAgent
MAX_ITER not specified, use default 30
USE_INSTANCE_IMAGE not specified, use default true
DATASET not specified, use default princeton-nlp/SWE-bench_Lite
SPLIT not specified, use default test
USE_INSTANCE_IMAGE: true
AGENT: CodeActAgent
AGENT_VERSION: v1.9
MODEL_CONFIG: MODEL_CONFIG
DATASET: princeton-nlp/SWE-bench_Lite
SPLIT: test
USE_HINT_TEXT: false
EVAL_NOTE: v1.9-no-hint



### Model and Agent

_No response_

### Operating System

_No response_

### Reproduction Steps

_No response_

### Logs, Errors, Screenshots, and Additional Context

_No response_

xingyaoww commented 1 month ago

Yes - i think that's somewhat expected behavior - docker acts weirdly when you try to run multiple images at once.

You can consider join our eval channel #remote-runtime-limited-beta to get access to our new infra for eval in parallel: https://www.all-hands.dev/blog/evaluation-of-llms-as-coding-agents-on-swe-bench-at-30x-speed

mamoodi commented 1 month ago

@xingyaoww just to clarify, when you say this is expected behavior, do you mean this will likely not be fixed? In the README: https://github.com/All-Hands-AI/OpenHands/tree/main/evaluation/swe_bench It specifically allows you to set number of workers

xingyaoww commented 1 month ago

Yeah i think so - maybe we should make this clearer on the README there

github-actions[bot] commented 1 week ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

All-Hands-AI / OpenHands