[Bug]: SWE-bench reset_swe_env.py timeout

JessChud commented 5 months ago

Is there an existing issue for the same bug?

[X] I have checked the troubleshooting document at https://opendevin.github.io/OpenDevin/modules/usage/troubleshooting
[X] I have checked the existing issues.

Describe the bug

in my config file I have this (I changed from the default gpt4 setting):

[eval_gpt3.5_0125_preview] model = "gpt-3.5-turbo-0125"

I run the inference command: /Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/scripts/run_infer.sh eval_gpt3.5_0125_preview CodeActAgent 1

and get the following output. Was wondering if you could help resolve. Thanks!

JessicaComputer:swe_bench Jessica$ /Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/scripts/run_infer.sh eval_gpt3.5_0125_preview CodeActAgent 1 AGENT: CodeActAgent AGENT_VERSION: v1.4 MODEL_CONFIG: eval_gpt3.5_0125_preview EVAL_LIMIT: 1 09:34:30 - opendevin.core.config:INFO: config.py:431 - Loading llm config from eval_gpt3.5_0125_preview 09:34:30 - opendevin:INFO: run_infer.py:330 - Config for evaluation: AppConfig(llm=LLMConfig(model='gpt-3.5-turbo', api_key='**', base_url=None, api_version=None, embedding_model='local', embedding_base_url=None, embedding_deployment_name=None, aws_access_key_id='**', aws_secret_access_key='**', aws_region_name=None, num_retries=5, retry_min_wait=3, retry_max_wait=60, timeout=None, max_chars=5000000, temperature=0, top_p=0.5, custom_llm_provider=None, max_input_tokens=None, max_output_tokens=None), agent=AgentConfig(name='CodeActAgent', memory_enabled=False, memory_max_threads=2), runtime='server', file_store='memory', file_store_path='/tmp/file_store', workspace_base='/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/workspace', workspace_mount_path='/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/workspace', workspace_mount_path_in_sandbox='/workspace', workspace_mount_rewrite=None, cache_dir='/tmp/cache', sandbox_container_image='ghcr.io/opendevin/sandbox:latest', run_as_devin=False, max_iterations=100, e2b_api_key='**', sandbox_type='ssh', use_host_network=False, ssh_hostname='localhost', disable_color=False, sandbox_user_id=502, sandbox_timeout=120, github_token='**', jwt_secret='f5e1d9d83dd94b1098b59118fcf43d93', debug=False, enable_auto_lint=True 09:34:30 - opendevin:INFO: run_infer.py:353 - Using evaluation output directory: evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4 09:34:30 - opendevin:INFO: run_infer.py:366 - Metadata: {'agent_class': 'CodeActAgent', 'model_name': 'gpt-3.5-turbo', 'max_iterations': 50, 'eval_output_dir': 'evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4', 'start_time': '2024-05-30 09:34:30', 'git_commit': '6ff50ed369163592041fdda5a7e9702ce79a17cc'} 09:34:30 - opendevin:INFO: run_infer.py:374 - Limiting evaluation to first 1 instances. 09:34:30 - opendevin:INFO: run_infer.py:378 - Writing evaluation output to evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4/output.jsonl 09:34:30 - opendevin:WARNING: run_infer.py:385 - Output file evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4/output.jsonl already exists. Loaded 0 finished instances. 09:34:30 - opendevin:INFO: run_infer.py:390 - Evaluation started with Agent CodeActAgent, model gpt-3.5-turbo, max iterations 50. 09:34:30 - opendevin:INFO: run_infer.py:406 - Finished instances: 0, Remaining instances: 1 0%| | 0/1 [00:00<?, ?it/s]09:34:30 - opendevin:INFO: run_infer.py:427 - Using 8 workers for evaluation. 09:34:30 - opendevin:INFO: run_infer.py:431 - Skipping workspace mount: True 09:34:46 - opendevin:INFO: run_infer.py:214 - Starting evaluation for instance django__django-15202. Hint: run "tail -f evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4/logs/instance_django__django-15202.log" to see live logs in a seperate shell 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:00<00:00, 60.04s/it]ERROR:concurrent.futures:exception calling callback for <Future at 0x139ffc050 state=finished raised ValueError> concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 259, in process_instance state: State = asyncio.run( ^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/opendevin/core/main.py", line 67, in main raise ValueError(f'Invalid toml file, cannot read {args.llm_config}') ValueError: Invalid toml file, cannot read eval_gpt3.5_0125_preview """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks callback(self) File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress output = future.result() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 452, in future.result() File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception ValueError: Invalid toml file, cannot read eval_gpt3.5_0125_preview ERROR:root: File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks callback(self) File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress output = future.result() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 452, in future.result() File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception

ERROR:root:<class 'ValueError'>: Invalid toml file, cannot read eval_gpt3.5_0125_preview 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:03<00:00, 63.63s/it] Exception ignored in: <function _ExecutorManagerThread.init..weakref_cb at 0x139fb1800> Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 308, in weakref_cb AttributeError: 'NoneType' object has no attribute 'util'

Current OpenDevin version

Docker Desktop 4.30.0 (149282)
not sure which opendevin version, will come back and update

Installation and Configuration

JessicaComputer:swe_bench Jessica$ /Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/scripts/run_infer.sh eval_gpt3.5_0125_preview CodeActAgent 1
AGENT: CodeActAgent
AGENT_VERSION: v1.4
MODEL_CONFIG: eval_gpt3.5_0125_preview
EVAL_LIMIT: 1
09:34:30 - opendevin.core.config:INFO: config.py:431 - Loading llm config from eval_gpt3.5_0125_preview
09:34:30 - opendevin:INFO: run_infer.py:330 - Config for evaluation: AppConfig(llm=LLMConfig(model='gpt-3.5-turbo', api_key='******', base_url=None, api_version=None, embedding_model='local', embedding_base_url=None, embedding_deployment_name=None, aws_access_key_id='******', aws_secret_access_key='******', aws_region_name=None, num_retries=5, retry_min_wait=3, retry_max_wait=60, timeout=None, max_chars=5000000, temperature=0, top_p=0.5, custom_llm_provider=None, max_input_tokens=None, max_output_tokens=None), agent=AgentConfig(name='CodeActAgent', memory_enabled=False, memory_max_threads=2), runtime='server', file_store='memory', file_store_path='/tmp/file_store', workspace_base='/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/workspace', workspace_mount_path='/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/workspace', workspace_mount_path_in_sandbox='/workspace', workspace_mount_rewrite=None, cache_dir='/tmp/cache', sandbox_container_image='ghcr.io/opendevin/sandbox:latest', run_as_devin=False, max_iterations=100, e2b_api_key='******', sandbox_type='ssh', use_host_network=False, ssh_hostname='localhost', disable_color=False, sandbox_user_id=502, sandbox_timeout=120, github_token='******', jwt_secret='f5e1d9d83dd94b1098b59118fcf43d93', debug=False, enable_auto_lint=True
09:34:30 - opendevin:INFO: run_infer.py:353 - Using evaluation output directory: evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4
09:34:30 - opendevin:INFO: run_infer.py:366 - Metadata: {'agent_class': 'CodeActAgent', 'model_name': 'gpt-3.5-turbo', 'max_iterations': 50, 'eval_output_dir': 'evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4', 'start_time': '2024-05-30 09:34:30', 'git_commit': '6ff50ed369163592041fdda5a7e9702ce79a17cc'}
09:34:30 - opendevin:INFO: run_infer.py:374 - Limiting evaluation to first 1 instances.
09:34:30 - opendevin:INFO: run_infer.py:378 - Writing evaluation output to evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4/output.jsonl
09:34:30 - opendevin:WARNING: run_infer.py:385 - Output file evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4/output.jsonl already exists. Loaded 0 finished instances.
09:34:30 - opendevin:INFO: run_infer.py:390 - Evaluation started with Agent CodeActAgent, model gpt-3.5-turbo, max iterations 50.
09:34:30 - opendevin:INFO: run_infer.py:406 - Finished instances: 0, Remaining instances: 1
  0%|                                                                                                       | 0/1 [00:00<?, ?it/s]09:34:30 - opendevin:INFO: run_infer.py:427 - Using 8 workers for evaluation.
09:34:30 - opendevin:INFO: run_infer.py:431 - Skipping workspace mount: True
09:34:46 - opendevin:INFO: run_infer.py:214 - Starting evaluation for instance django__django-15202.
Hint: run "tail -f evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4/logs/instance_django__django-15202.log" to see live logs in a seperate shell
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:00<00:00, 60.04s/it]ERROR:concurrent.futures:exception calling callback for <Future at 0x139ffc050 state=finished raised ValueError>
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 259, in process_instance
    state: State = asyncio.run(
                   ^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/Jessica/Downloads/OpenDevin/opendevin/core/main.py", line 67, in main
    raise ValueError(f'Invalid toml file, cannot read {args.llm_config}')
ValueError: Invalid toml file, cannot read eval_gpt3.5_0125_preview
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks
    callback(self)
  File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress
    output = future.result()
             ^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 452, in <module>
    future.result()
  File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
ValueError: Invalid toml file, cannot read eval_gpt3.5_0125_preview
ERROR:root:  File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks
    callback(self)
  File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress
    output = future.result()
             ^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 452, in <module>
    future.result()
  File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception

ERROR:root:<class 'ValueError'>: Invalid toml file, cannot read eval_gpt3.5_0125_preview
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:03<00:00, 63.63s/it]
Exception ignored in: <function _ExecutorManagerThread.__init__.<locals>.weakref_cb at 0x139fb1800>
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 308, in weakref_cb
AttributeError: 'NoneType' object has no attribute 'util'

Model and Agent

No response

Operating System

No response

Reproduction Steps

No response

Logs, Errors, Screenshots, and Additional Context

No response

SmartManoj commented 5 months ago

[eval_gpt3.5_0125_preview]
test=1

parsed as {'eval_gpt3': {'5_0125_preview': {'test': 1}}}

Dot is the issue. remove the dot and check

JessChud commented 5 months ago

Thanks -- I made the change as recommended to say eval_gpt35_0125_preview instead and here's the error trace I'm getting now:

JessicaComputer:OpenDevin Jessica$ evaluation/swe_bench/scripts/run_infer.sh eval_gpt35_0125_preview CodeActAgent 1

AGENT: CodeActAgent AGENT_VERSION: v1.4 MODEL_CONFIG: eval_gpt35_0125_preview EVAL_LIMIT: 1 10:19:16 - opendevin.core.config:INFO: config.py:431 - Loading llm config from eval_gpt35_0125_preview 10:19:16 - opendevin.core.config:ERROR: config.py:438 - Config file not found: [Errno 2] No such file or directory: 'config.toml' 10:19:16 - opendevin:INFO: run_infer.py:330 - Config for evaluation: AppConfig(llm=LLMConfig(model='gpt-3.5-turbo', api_key='**', base_url=None, api_version=None, embedding_model='local', embedding_base_url=None, embedding_deployment_name=None, aws_access_key_id='**', aws_secret_access_key='**', aws_region_name=None, num_retries=5, retry_min_wait=3, retry_max_wait=60, timeout=None, max_chars=5000000, temperature=0, top_p=0.5, custom_llm_provider=None, max_input_tokens=None, max_output_tokens=None), agent=AgentConfig(name='CodeActAgent', memory_enabled=False, memory_max_threads=2), runtime='server', file_store='memory', file_store_path='/tmp/file_store', workspace_base='/Users/Jessica/Downloads/OpenDevin/workspace', workspace_mount_path='/Users/Jessica/Downloads/OpenDevin/workspace', workspace_mount_path_in_sandbox='/workspace', workspace_mount_rewrite=None, cache_dir='/tmp/cache', sandbox_container_image='ghcr.io/opendevin/sandbox:main', run_as_devin=True, max_iterations=100, e2b_api_key='**', sandbox_type='ssh', use_host_network=False, ssh_hostname='localhost', disable_color=False, sandbox_user_id=502, sandbox_timeout=120, github_token='**', jwt_secret='a5b4c9b586bc4f8fab1d120354beb167', debug=False, enable_auto_lint=False 10:19:16 - opendevin:INFO: run_infer.py:353 - Using evaluation output directory: evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4 10:19:16 - opendevin:INFO: run_infer.py:366 - Metadata: {'agent_class': 'CodeActAgent', 'model_name': 'gpt-3.5-turbo', 'max_iterations': 50, 'eval_output_dir': 'evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4', 'start_time': '2024-05-30 10:19:16', 'git_commit': '6ff50ed369163592041fdda5a7e9702ce79a17cc'} 10:19:16 - opendevin:INFO: run_infer.py:374 - Limiting evaluation to first 1 instances. 10:19:16 - opendevin:INFO: run_infer.py:378 - Writing evaluation output to evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4/output.jsonl 10:19:16 - opendevin:WARNING: run_infer.py:385 - Output file evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4/output.jsonl already exists. Loaded 0 finished instances. 10:19:16 - opendevin:INFO: run_infer.py:390 - Evaluation started with Agent CodeActAgent, model gpt-3.5-turbo, max iterations 50. 10:19:16 - opendevin:INFO: run_infer.py:406 - Finished instances: 0, Remaining instances: 1 0%| | 0/1 [00:00<?, ?it/s]10:19:16 - opendevin:INFO: run_infer.py:427 - Using 8 workers for evaluation. 10:19:16 - opendevin:INFO: run_infer.py:431 - Skipping workspace mount: True 10:19:33 - opendevin:INFO: run_infer.py:214 - Starting evaluation for instance django__django-15202. Hint: run "tail -f evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4/logs/instance_djangodjango-15202.log" to see live logs in a seperate shell 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:23<00:00, 23.26s/it]ERROR:concurrent.futures:exception calling callback for <Future at 0x130eebed0 state=finished raised EOF> concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 234, in process_instance sandbox = SWEBenchSSHBox.get_box_for_instance( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/swe_env_box.py", line 96, in get_box_for_instance sandbox = cls( ^^^^ File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/swe_env_box.py", line 61, in init__ exit_code, output = self.execute('source /swe_util/swe_entry.sh', timeout=600) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/opendevin/runtime/docker/ssh_box.py", line 440, in execute success = self.ssh.prompt(timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/pexpect/pxssh.py", line 506, in prompt i = self.expect([self.PROMPT, TIMEOUT], timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/pexpect/spawnbase.py", line 354, in expect return self.expect_list(compiled_pattern_list, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/pexpect/spawnbase.py", line 383, in expect_list return exp.expect_loop(timeout) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/pexpect/expect.py", line 179, in expect_loop return self.eof(e) ^^^^^^^^^^^ File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/pexpect/expect.py", line 122, in eof raise exc pexpect.exceptions.EOF: End Of File (EOF). Empty string style platform. <pexpect.pxssh.pxssh object at 0x134866910> command: /usr/bin/ssh args: [b'/usr/bin/ssh', b'-q', b'-p', b'61976', b'-l', b'opendevin', b'localhost'] buffer (last 100 chars): '' before (last 100 chars): "Error: This script is intended to be run by the 'root' user only.\r\n" after: <class 'pexpect.exceptions.EOF'> match: None match_index: None exitstatus: None flag_eof: True pid: 25042 child_fd: 35 closed: False timeout: 220 delimiter: <class 'pexpect.exceptions.EOF'> logfile: None logfile_read: None logfile_send: None maxread: 2000 ignorecase: False searchwindowsize: None delaybeforesend: 0.05 delayafterclose: 0.1 delayafterterminate: 0.1 searcher: searcher_re: 0: re.compile('\[PEXPECT\][\$\#] ') 1: TIMEOUT """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks callback(self) File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress output = future.result() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 452, in future.result() File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception pexpect.exceptions.EOF: End Of File (EOF). Empty string style platform. <pexpect.pxssh.pxssh object at 0x134866910> command: /usr/bin/ssh args: [b'/usr/bin/ssh', b'-q', b'-p', b'61976', b'-l', b'opendevin', b'localhost'] buffer (last 100 chars): '' before (last 100 chars): "Error: This script is intended to be run by the 'root' user only.\r\n" after: <class 'pexpect.exceptions.EOF'> match: None match_index: None exitstatus: None flag_eof: True pid: 25042 child_fd: 35 closed: False timeout: 220 delimiter: <class 'pexpect.exceptions.EOF'> logfile: None logfile_read: None logfile_send: None maxread: 2000 ignorecase: False searchwindowsize: None delaybeforesend: 0.05 delayafterclose: 0.1 delayafterterminate: 0.1 searcher: searcher_re: 0: re.compile('\[PEXPECT\][\$\#] ') 1: TIMEOUT ERROR:root: File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks callback(self) File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress output = future.result() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 452, in future.result() File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception

ERROR:root:<class 'pexpect.exceptions.EOF'>: End Of File (EOF). Empty string style platform. <pexpect.pxssh.pxssh object at 0x134866910> command: /usr/bin/ssh args: [b'/usr/bin/ssh', b'-q', b'-p', b'61976', b'-l', b'opendevin', b'localhost'] buffer (last 100 chars): '' before (last 100 chars): "Error: This script is intended to be run by the 'root' user only.\r\n" after: <class 'pexpect.exceptions.EOF'> match: None match_index: None exitstatus: None flag_eof: True pid: 25042 child_fd: 35 closed: False timeout: 220 delimiter: <class 'pexpect.exceptions.EOF'> logfile: None logfile_read: None logfile_send: None maxread: 2000 ignorecase: False searchwindowsize: None delaybeforesend: 0.05 delayafterclose: 0.1 delayafterterminate: 0.1 searcher: searcher_re: 0: re.compile('\[PEXPECT\][\$\#] ') 1: TIMEOUT 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:26<00:00, 26.25s/it] Exception ignored in: <function _ExecutorManagerThread.init..weakref_cb at 0x130ef6200> Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 308, in weakref_cb AttributeError: 'NoneType' object has no attribute 'util'

SmartManoj commented 5 months ago

run_as_devin=True,

Why it is true now?

JessChud commented 5 months ago

Good question, I'm not sure, I've set it again in terminal to say run_as_devin=False, and the Config file says run_as_devin=False.

run_as_devin=False JessicaComputer:OpenDevin Jessica$ evaluation/swe_bench/scripts/run_infer.sh eval_gpt35_0125_preview CodeActAgent 1 AGENT: CodeActAgent AGENT_VERSION: v1.4 MODEL_CONFIG: eval_gpt35_0125_preview EVAL_LIMIT: 1 10:36:38 - opendevin.core.config:INFO: config.py:431 - Loading llm config from eval_gpt35_0125_preview 10:36:38 - opendevin.core.config:ERROR: config.py:438 - Config file not found: [Errno 2] No such file or directory: 'config.toml' 10:36:38 - opendevin:INFO: run_infer.py:330 - Config for evaluation: AppConfig(llm=LLMConfig(model='gpt-3.5-turbo', api_key='**', base_url=None, api_version=None, embedding_model='local', embedding_base_url=None, embedding_deployment_name=None, aws_access_key_id='**', aws_secret_access_key='**', aws_region_name=None, num_retries=5, retry_min_wait=3, retry_max_wait=60, timeout=None, max_chars=5000000, temperature=0, top_p=0.5, custom_llm_provider=None, max_input_tokens=None, max_output_tokens=None), agent=AgentConfig(name='CodeActAgent', memory_enabled=False, memory_max_threads=2), runtime='server', file_store='memory', file_store_path='/tmp/file_store', workspace_base='/Users/Jessica/Downloads/OpenDevin/workspace', workspace_mount_path='/Users/Jessica/Downloads/OpenDevin/workspace', workspace_mount_path_in_sandbox='/workspace', workspace_mount_rewrite=None, cache_dir='/tmp/cache', sandbox_container_image='ghcr.io/opendevin/sandbox:main', run_as_devin=True, max_iterations=100, e2b_api_key='**', sandbox_type='ssh', use_host_network=False, ssh_hostname='localhost', disable_color=False, sandbox_user_id=502, sandbox_timeout=120, github_token='**', jwt_secret='c2b8a6c3613e464b888e5e8f84cd92e0', debug=False, enable_auto_lint=False

JessChud commented 5 months ago

I think it might not be finding the Config file and defaulting to default parameters.

SmartManoj commented 5 months ago

set in caps for environmental variables: RUN_AS_DEVIN=true

SmartManoj commented 5 months ago

10:36:38 - opendevin.core.config:ERROR: config.py:438 - Config file not found: [Errno 2] No such file or directory: 'config.toml'

Yes. Location of 'config.toml'?

JessChud commented 5 months ago

Root of opendevin directory. I have had it inside evaluation/swe_bench previously and the outcome was the same.

SmartManoj commented 5 months ago

export RUN_AS_DEVIN=true

JessChud commented 5 months ago

Here is the error trace now, and the config file is in the OpenDevin/opendevin directory.

JessicaComputer:OpenDevin Jessica$ export RUN_AS_DEVIN=true JessicaComputer:OpenDevin Jessica$ evaluation/swe_bench/scripts/run_infer.sh eval_gpt35_0125_preview CodeActAgent 1 AGENT: CodeActAgent AGENT_VERSION: v1.4 MODEL_CONFIG: eval_gpt35_0125_preview EVAL_LIMIT: 1 11:00:37 - opendevin.core.config:INFO: config.py:431 - Loading llm config from eval_gpt35_0125_preview 11:00:37 - opendevin.core.config:ERROR: config.py:438 - Config file not found: [Errno 2] No such file or directory: 'config.toml' 11:00:37 - opendevin:INFO: run_infer.py:330 - Config for evaluation: AppConfig(llm=LLMConfig(model='gpt-3.5-turbo', api_key='**', base_url=None, api_version=None, embedding_model='local', embedding_base_url=None, embedding_deployment_name=None, aws_access_key_id='**', aws_secret_access_key='**', aws_region_name=None, num_retries=5, retry_min_wait=3, retry_max_wait=60, timeout=None, max_chars=5000000, temperature=0, top_p=0.5, custom_llm_provider=None, max_input_tokens=None, max_output_tokens=None), agent=AgentConfig(name='CodeActAgent', memory_enabled=False, memory_max_threads=2), runtime='server', file_store='memory', file_store_path='/tmp/file_store', workspace_base='/Users/Jessica/Downloads/OpenDevin/workspace', workspace_mount_path='/Users/Jessica/Downloads/OpenDevin/workspace', workspace_mount_path_in_sandbox='/workspace', workspace_mount_rewrite=None, cache_dir='/tmp/cache', sandbox_container_image='ghcr.io/opendevin/sandbox:main', run_as_devin=True, max_iterations=100, e2b_api_key='**', sandbox_type='ssh', use_host_network=False, ssh_hostname='localhost', disable_color=False, sandbox_user_id=502, sandbox_timeout=120, github_token='**', jwt_secret='1d769369774348a98a71cdbb82c403fd', debug=False, enable_auto_lint=False 11:00:37 - opendevin:INFO: run_infer.py:353 - Using evaluation output directory: evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4 11:00:37 - opendevin:INFO: run_infer.py:366 - Metadata: {'agent_class': 'CodeActAgent', 'model_name': 'gpt-3.5-turbo', 'max_iterations': 50, 'eval_output_dir': 'evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4', 'start_time': '2024-05-30 11:00:37', 'git_commit': '6ff50ed369163592041fdda5a7e9702ce79a17cc'} 11:00:37 - opendevin:INFO: run_infer.py:374 - Limiting evaluation to first 1 instances. 11:00:37 - opendevin:INFO: run_infer.py:378 - Writing evaluation output to evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4/output.jsonl 11:00:37 - opendevin:WARNING: run_infer.py:385 - Output file evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4/output.jsonl already exists. Loaded 0 finished instances. 11:00:37 - opendevin:INFO: run_infer.py:390 - Evaluation started with Agent CodeActAgent, model gpt-3.5-turbo, max iterations 50. 11:00:37 - opendevin:INFO: run_infer.py:406 - Finished instances: 0, Remaining instances: 1 0%| | 0/1 [00:00<?, ?it/s]11:00:37 - opendevin:INFO: run_infer.py:427 - Using 8 workers for evaluation. 11:00:37 - opendevin:INFO: run_infer.py:431 - Skipping workspace mount: True 11:00:52 - opendevin:INFO: run_infer.py:214 - Starting evaluation for instance django__django-15202. Hint: run "tail -f evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4/logs/instance_djangodjango-15202.log" to see live logs in a seperate shell 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:20<00:00, 20.57s/it]ERROR:concurrent.futures:exception calling callback for <Future at 0x1347343d0 state=finished raised EOF> concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 234, in process_instance sandbox = SWEBenchSSHBox.get_box_for_instance( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/swe_env_box.py", line 96, in get_box_for_instance sandbox = cls( ^^^^ File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/swe_env_box.py", line 61, in init__ exit_code, output = self.execute('source /swe_util/swe_entry.sh', timeout=600) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/opendevin/runtime/docker/ssh_box.py", line 440, in execute success = self.ssh.prompt(timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/pexpect/pxssh.py", line 506, in prompt i = self.expect([self.PROMPT, TIMEOUT], timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/pexpect/spawnbase.py", line 354, in expect return self.expect_list(compiled_pattern_list, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/pexpect/spawnbase.py", line 383, in expect_list return exp.expect_loop(timeout) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/pexpect/expect.py", line 179, in expect_loop return self.eof(e) ^^^^^^^^^^^ File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/pexpect/expect.py", line 122, in eof raise exc pexpect.exceptions.EOF: End Of File (EOF). Empty string style platform. <pexpect.pxssh.pxssh object at 0x130d3f710> command: /usr/bin/ssh args: [b'/usr/bin/ssh', b'-q', b'-p', b'62323', b'-l', b'opendevin', b'localhost'] buffer (last 100 chars): '' before (last 100 chars): "Error: This script is intended to be run by the 'root' user only.\r\n" after: <class 'pexpect.exceptions.EOF'> match: None match_index: None exitstatus: None flag_eof: True pid: 29667 child_fd: 35 closed: False timeout: 220 delimiter: <class 'pexpect.exceptions.EOF'> logfile: None logfile_read: None logfile_send: None maxread: 2000 ignorecase: False searchwindowsize: None delaybeforesend: 0.05 delayafterclose: 0.1 delayafterterminate: 0.1 searcher: searcher_re: 0: re.compile('\[PEXPECT\][\$\#] ') 1: TIMEOUT """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks callback(self) File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress output = future.result() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception pexpect.exceptions.EOF: End Of File (EOF). Empty string style platform. <pexpect.pxssh.pxssh object at 0x130d3f710> command: /usr/bin/ssh args: [b'/usr/bin/ssh', b'-q', b'-p', b'62323', b'-l', b'opendevin', b'localhost'] buffer (last 100 chars): '' before (last 100 chars): "Error: This script is intended to be run by the 'root' user only.\r\n" after: <class 'pexpect.exceptions.EOF'> match: None match_index: None exitstatus: None flag_eof: True pid: 29667 child_fd: 35 closed: False timeout: 220 delimiter: <class 'pexpect.exceptions.EOF'> logfile: None logfile_read: None logfile_send: None maxread: 2000 ignorecase: False searchwindowsize: None delaybeforesend: 0.05 delayafterclose: 0.1 delayafterterminate: 0.1 searcher: searcher_re: 0: re.compile('\[PEXPECT\][\$\#] ') 1: TIMEOUT ERROR:root: File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 452, in future.result() File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks callback(self) File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress output = future.result() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result raise self._exception

ERROR:root:<class 'pexpect.exceptions.EOF'>: End Of File (EOF). Empty string style platform. <pexpect.pxssh.pxssh object at 0x130d3f710> command: /usr/bin/ssh args: [b'/usr/bin/ssh', b'-q', b'-p', b'62323', b'-l', b'opendevin', b'localhost'] buffer (last 100 chars): '' before (last 100 chars): "Error: This script is intended to be run by the 'root' user only.\r\n" after: <class 'pexpect.exceptions.EOF'> match: None match_index: None exitstatus: None flag_eof: True pid: 29667 child_fd: 35 closed: False timeout: 220 delimiter: <class 'pexpect.exceptions.EOF'> logfile: None logfile_read: None logfile_send: None maxread: 2000 ignorecase: False searchwindowsize: None delaybeforesend: 0.05 delayafterclose: 0.1 delayafterterminate: 0.1 searcher: searcher_re: 0: re.compile('\[PEXPECT\][\$\#] ') 1: TIMEOUT 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:23<00:00, 23.60s/it] Exception ignored in: <function _ExecutorManagerThread.init..weakref_cb at 0x1346ea200> Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 308, in weakref_cb AttributeError: 'NoneType' object has no attribute 'util' JessicaComputer:OpenDevin Jessica$

SmartManoj commented 5 months ago

Sorry export RUN_AS_DEVIN=false

JessChud commented 5 months ago

Here is the error trace now... I still suspect not finding the config file might be contributing to the issue. Would love to know your thoughts though. Thanks!

JessicaComputer:OpenDevin Jessica$ export RUN_AS_DEVIN=false JessicaComputer:OpenDevin Jessica$ clear

JessicaComputer:OpenDevin Jessica$ export RUN_AS_DEVIN=false JessicaComputer:OpenDevin Jessica$ evaluation/swe_bench/scripts/run_infer.sh eval_gpt35_0125_preview CodeActAgent 1 AGENT: CodeActAgent AGENT_VERSION: v1.4 MODEL_CONFIG: eval_gpt35_0125_preview EVAL_LIMIT: 1 11:05:07 - opendevin.core.config:INFO: config.py:431 - Loading llm config from eval_gpt35_0125_preview 11:05:07 - opendevin.core.config:ERROR: config.py:438 - Config file not found: [Errno 2] No such file or directory: 'config.toml' 11:05:07 - opendevin:INFO: run_infer.py:330 - Config for evaluation: AppConfig(llm=LLMConfig(model='gpt-3.5-turbo', api_key='**', base_url=None, api_version=None, embedding_model='local', embedding_base_url=None, embedding_deployment_name=None, aws_access_key_id='**', aws_secret_access_key='**', aws_region_name=None, num_retries=5, retry_min_wait=3, retry_max_wait=60, timeout=None, max_chars=5000000, temperature=0, top_p=0.5, custom_llm_provider=None, max_input_tokens=None, max_output_tokens=None), agent=AgentConfig(name='CodeActAgent', memory_enabled=False, memory_max_threads=2), runtime='server', file_store='memory', file_store_path='/tmp/file_store', workspace_base='/Users/Jessica/Downloads/OpenDevin/workspace', workspace_mount_path='/Users/Jessica/Downloads/OpenDevin/workspace', workspace_mount_path_in_sandbox='/workspace', workspace_mount_rewrite=None, cache_dir='/tmp/cache', sandbox_container_image='ghcr.io/opendevin/sandbox:main', run_as_devin=False, max_iterations=100, e2b_api_key='**', sandbox_type='ssh', use_host_network=False, ssh_hostname='localhost', disable_color=False, sandbox_user_id=502, sandbox_timeout=120, github_token='**', jwt_secret='b066b8e5d1834fd0b448d6459f50efcf', debug=False, enable_auto_lint=False 11:05:07 - opendevin:INFO: run_infer.py:353 - Using evaluation output directory: evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4 11:05:07 - opendevin:INFO: run_infer.py:366 - Metadata: {'agent_class': 'CodeActAgent', 'model_name': 'gpt-3.5-turbo', 'max_iterations': 50, 'eval_output_dir': 'evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4', 'start_time': '2024-05-30 11:05:07', 'git_commit': '6ff50ed369163592041fdda5a7e9702ce79a17cc'} 11:05:07 - opendevin:INFO: run_infer.py:374 - Limiting evaluation to first 1 instances. 11:05:07 - opendevin:INFO: run_infer.py:378 - Writing evaluation output to evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4/output.jsonl 11:05:07 - opendevin:WARNING: run_infer.py:385 - Output file evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4/output.jsonl already exists. Loaded 0 finished instances. 11:05:07 - opendevin:INFO: run_infer.py:390 - Evaluation started with Agent CodeActAgent, model gpt-3.5-turbo, max iterations 50. 11:05:07 - opendevin:INFO: run_infer.py:406 - Finished instances: 0, Remaining instances: 1 0%| | 0/1 [00:00<?, ?it/s]11:05:07 - opendevin:INFO: run_infer.py:427 - Using 8 workers for evaluation. 11:05:07 - opendevin:INFO: run_infer.py:431 - Skipping workspace mount: True 11:05:23 - opendevin:INFO: run_infer.py:214 - Starting evaluation for instance django__django-15202. Hint: run "tail -f evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo_maxiter_50_N_v1.4/logs/instance_django__django-15202.log" to see live logs in a seperate shell 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:43<00:00, 43.67s/it]ERROR:concurrent.futures:exception calling callback for <Future at 0x12fef8690 state=finished raised ValueError> concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 259, in process_instance state: State = asyncio.run( ^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/opendevin/core/main.py", line 67, in main raise ValueError(f'Invalid toml file, cannot read {args.llm_config}') ValueError: Invalid toml file, cannot read eval_gpt35_0125_preview """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks callback(self) File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress output = future.result() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result raise self._exception ValueError: Invalid toml file, cannot read eval_gpt35_0125_preview ERROR:root: File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 452, in future.result() File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks callback(self) File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress output = future.result() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception

ERROR:root:<class 'ValueError'>: Invalid toml file, cannot read eval_gpt35_0125_preview 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:47<00:00, 47.41s/it] Exception ignored in: <function _ExecutorManagerThread.init..weakref_cb at 0x12feb2200> Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 308, in weakref_cb AttributeError: 'NoneType' object has no attribute 'util'

SmartManoj commented 5 months ago

run ls config.toml

JessChud commented 5 months ago

Thank you, was able to fix the issue of config being in the wrong directory, now am getting this error trace.

JessicaComputer:OpenDevin Jessica$ evaluation/swe_bench/scripts/run_infer.sh eval_gpt35_0125_preview CodeActAgent 1 AGENT: CodeActAgent AGENT_VERSION: v1.4 MODEL_CONFIG: eval_gpt35_0125_preview EVAL_LIMIT: 1 11:19:15 - opendevin.core.config:INFO: config.py:431 - Loading llm config from eval_gpt35_0125_preview 11:19:15 - opendevin:INFO: run_infer.py:330 - Config for evaluation: AppConfig(llm=LLMConfig(model='gpt-3.5-turbo-0125', api_key='**', base_url=None, api_version=None, embedding_model='local', embedding_base_url=None, embedding_deployment_name=None, aws_access_key_id='**', aws_secret_access_key='**', aws_region_name=None, num_retries=5, retry_min_wait=3, retry_max_wait=60, timeout=None, max_chars=5000000, temperature=0.0, top_p=0.5, custom_llm_provider=None, max_input_tokens=None, max_output_tokens=None), agent=AgentConfig(name='CodeActAgent', memory_enabled=False, memory_max_threads=2), runtime='server', file_store='memory', file_store_path='/tmp/file_store', workspace_base='/Users/Jessica/Downloads/OpenDevin/workspace', workspace_mount_path='/Users/Jessica/Downloads/OpenDevin/workspace', workspace_mount_path_in_sandbox='/workspace', workspace_mount_rewrite=None, cache_dir='/tmp/cache', sandbox_container_image='ghcr.io/opendevin/sandbox:latest', run_as_devin=False, max_iterations=100, e2b_api_key='**', sandbox_type='ssh', use_host_network=False, ssh_hostname='localhost', disable_color=False, sandbox_user_id=502, sandbox_timeout=120, github_token='**', jwt_secret='cf972a2d8a474e0a8d631f37103423a2', debug=False, enable_auto_lint=True 11:19:15 - opendevin:INFO: run_infer.py:353 - Using evaluation output directory: evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4 11:19:15 - opendevin:INFO: run_infer.py:366 - Metadata: {'agent_class': 'CodeActAgent', 'model_name': 'gpt-3.5-turbo-0125', 'max_iterations': 50, 'eval_output_dir': 'evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4', 'start_time': '2024-05-30 11:19:15', 'git_commit': '6ff50ed369163592041fdda5a7e9702ce79a17cc'} 11:19:15 - opendevin:INFO: run_infer.py:374 - Limiting evaluation to first 1 instances. 11:19:15 - opendevin:INFO: run_infer.py:378 - Writing evaluation output to evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4/output.jsonl 11:19:15 - opendevin:INFO: run_infer.py:390 - Evaluation started with Agent CodeActAgent, model gpt-3.5-turbo-0125, max iterations 50. 11:19:15 - opendevin:INFO: run_infer.py:406 - Finished instances: 0, Remaining instances: 1 0%| | 0/1 [00:00<?, ?it/s]11:19:15 - opendevin:INFO: run_infer.py:427 - Using 8 workers for evaluation. 11:19:15 - opendevin:INFO: run_infer.py:431 - Skipping workspace mount: True 11:19:34 - opendevin:INFO: run_infer.py:214 - Starting evaluation for instance djangodjango-15202. Hint: run "tail -f evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4/logs/instance_django__django-15202.log" to see live logs in a seperate shell 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:57<00:00, 57.56s/it]ERROR:concurrent.futures:exception calling callback for <Future at 0x12e32fd10 state=finished raised BrowserException> concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 259, in process_instance state: State = asyncio.run( ^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/opendevin/core/main.py", line 92, in main runtime = ServerRuntime(event_stream=event_stream, sandbox=sandbox) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/opendevin/runtime/server/runtime.py", line 35, in init super().init(event_stream, sid, sandbox) File "/Users/Jessica/Downloads/OpenDevin/opendevin/runtime/runtime.py", line 74, in init self.browser = BrowserEnv() ^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/opendevin/runtime/browser/browser_env.py", line 41, in init__ raise BrowserException('Failed to start browser environment.') opendevin.runtime.browser.browser_env.BrowserException: Failed to start browser environment. """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks callback(self) File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress output = future.result() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 452, in future.result() File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception opendevin.runtime.browser.browser_env.BrowserException: Failed to start browser environment. Process Process-1:1: Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, self._kwargs) File "/Users/Jessica/Downloads/OpenDevin/opendevin/runtime/browser/browser_env.py", line 52, in browser_process obs, info = env.reset() ^^^^^^^^^^^ File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/gymnasium/wrappers/order_enforcing.py", line 61, in reset return self.env.reset(kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/browsergym/core/env.py", line 186, in reset self.browser = pw.chromium.launch( ^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/playwright/sync_api/_generated.py", line 14778, in launch self._sync( File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/playwright/_impl/_sync_base.py", line 109, in _sync return task.result() ^^^^^^^^^^^^^ File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/playwright/_impl/_browser_type.py", line 96, in launch Browser, from_channel(await self._channel.send("launch", params)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 61, in send return await self._connection.wrap_api_call( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 490, in wrap_api_call return await cb() ^^^^^^^^^^ File "/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 99, in inner_send result = next(iter(done)).result() ^^^^^^^^^^^^^^^^^^^^^^^^^ playwright._impl._api_types.Error: Executable doesn't exist at /Users/Jessica/Library/Caches/ms-playwright/chromium-1084/chrome-mac/Chromium.app/Contents/MacOS/Chromium ╔════════════════════════════════════════════════════════════╗ ║ Looks like Playwright was just installed or updated. ║ ║ Please run the following command to download new browsers: ║ ║ ║ ║ playwright install ║ ║ ║ ║ <3 Playwright Team ║ ╚════════════════════════════════════════════════════════════╝ ERROR:root: File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks callback(self) File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress output = future.result() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 452, in future.result() File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception

ERROR:root:<class 'opendevin.runtime.browser.browser_env.BrowserException'>: Failed to start browser environment. 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:13<00:00, 73.44s/it] Exception ignored in: <function _ExecutorManagerThread.init..weakref_cb at 0x12e33a340> Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 308, in weakref_cb AttributeError: 'NoneType' object has no attribute 'util'

SmartManoj commented 5 months ago

Please run the following command to download new browsers: ║ ║ ║ ║ playwright install ║

JessChud commented 5 months ago

Thank you, I have attempted to install it with these instructions, https://playwright.dev/docs/intro#installing-playwright, and it appears to have been installed correctly. But, when I run this command, evaluation/swe_bench/scripts/run_infer.sh eval_gpt35_0125_preview CodeActAgent 1, I still get the same exact error trace as above with the instruction to install playwright.

SmartManoj commented 5 months ago

Could you reopen the terminal and run again?

JessChud commented 5 months ago

Thank you I think this helped!!

I am now getting this problem: JessicaComputer:OpenDevin Jessica$ evaluation/swe_bench/scripts/run_infer.sh eval_gpt35_0125_preview CodeActAgent 1 AGENT: CodeActAgent AGENT_VERSION: v1.4 MODEL_CONFIG: eval_gpt35_0125_preview EVAL_LIMIT: 1 Traceback (most recent call last): File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 17, in from evaluation.swe_bench.swe_env_box import SWEBenchSSHBox ModuleNotFoundError: No module named 'evaluation.swe_bench'

SmartManoj commented 5 months ago

https://github.com/OpenDevin/OpenDevin/blob/main/Development.md#2-build-and-setup-the-environment

JessChud commented 5 months ago

Do you think the environment set up is the issue? I feel the issue might be just how the files are structured / it not seeing this module.

SmartManoj commented 5 months ago

Yes, because the package has not been installed, and the script runs through poetry. You need to run poetry install or maybe set PYTHONPATH=`pwd`:$PYTHONPATH and check

The same structure works for me.

JessChud commented 5 months ago

Got it, will do. Thank you. Do you have a suggestion for how to resolve this? I believe all my versions that are described at the top of the page are correct. Thank you.

JessicaComputer:OpenDevin Jessica$ make build Building project... Checking dependencies... Checking system... macOS detected. Checking Python installation... Python 3.11.7 is already installed. Checking npm installation... npm 10.7.0 is already installed. Checking Node.js installation... Node.js 22.2.0 is already installed. Checking Docker installation... Docker version 26.1.1, build 4cf5afa is already installed. Checking Poetry installation... Poetry (version 1.8.3) is already installed. Dependencies checked successfully. Pulling Docker image... Using default tag: latest latest: Pulling from opendevin/sandbox Digest: sha256:4bd05c581692e26a448bbc6771a21bb27002cb0e6bcf5034d0db91bb8704d0f0 Status: Image is up to date for ghcr.io/opendevin/sandbox:latest ghcr.io/opendevin/sandbox:latest

What's Next? View a summary of image vulnerabilities and recommendations → docker scout quickview ghcr.io/opendevin/sandbox Docker image pulled successfully. Installing Python dependencies... /bin/bash: chroma-hnswlib: command not found Installing ... Requirement already satisfied: chroma-hnswlib in /Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages (0.7.3) Requirement already satisfied: numpy in /Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages (from chroma-hnswlib) (1.26.4) Installing dependencies from lock file

No dependencies to install or update

Installing the current project: opendevin (0.1.0) [Errno 13] Permission denied: '/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/opendevin-0.1.0.dist-info/METADATA' make[1]: [install-python-dependencies] Error 1 make: [build] Error 2 JessicaComputer:OpenDevin Jessica$

SmartManoj commented 5 months ago

[Errno 13] Permission denied: '/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/opendevin-0.1.0.dist-info/METADATA'

Verify the permissions of the file:

ls -l /Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/opendevin-0.1.0.dist-info/METADATA1

Ensure no processes are locking the file:

lsof /Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/opendevin-0.1.0.dist-info/METADATA1

JessChud commented 5 months ago

[Errno 13] Permission denied: '/Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/opendevin-0.1.0.dist-info/METADATA'

Verify the permissions of the file:

ls -l /Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/opendevin-0.1.0.dist-info/METADATA1

Ensure no processes are locking the file:

lsof /Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/opendevin-0.1.0.dist-info/METADATA1

Hi, thanks! The outputs are:

JessicaComputer:OpenDevin Jessica$ ls -l /Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/opendevin-0.1.0.dist-info/METADATA -rw-r--r-- 1 root staff 8195 May 30 13:49 /Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/opendevin-0.1.0.dist-info/METADATA

and

JessicaComputer:OpenDevin Jessica$ lsof /Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/opendevin-0.1.0.dist-info/METADATA JessicaComputer:OpenDevin Jessica$

What do you suggest as next steps to getting this working?

SmartManoj commented 5 months ago

The file is owned by root with permissions set to rw-r--r--, meaning only the root user has write permissions. No processes are locking the file, so the issue is purely permission-related.

Change the ownership of the file so that the current user (Jessica) can access it.

sudo chown jessica /Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/opendevin-0.1.0.dist-info/METADATA

JessChud commented 5 months ago

Thank you! Have done this, and am now back at this problem: JessicaComputer:OpenDevin Jessica$ sudo chown jessica /Users/Jessica/Library/Caches/pypoetry/virtualenvs/opendevin-Ilj8wfey-py3.11/lib/python3.11/site-packages/opendevin-0.1.0.dist-info/METADATA Password: JessicaComputer:OpenDevin Jessica$ evaluation/swe_bench/scripts/run_infer.sh eval_gpt35_0125_preview CodeActAgent 1 AGENT: CodeActAgent AGENT_VERSION: v1.4 MODEL_CONFIG: eval_gpt35_0125_preview EVAL_LIMIT: 1 Traceback (most recent call last): File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 17, in from evaluation.swe_bench.swe_env_box import SWEBenchSSHBox ModuleNotFoundError: No module named 'evaluation.swe_bench'

During setup I had the following issue:

JessicaComputer:OpenDevin Jessica$ make run Running the app... Starting backend server... Waiting for the backend to start... INFO: Started server process [87671] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:3000 (Press CTRL+C to quit) Connection to localhost port 3000 [tcp/hbci] succeeded! Backend started successfully. Starting frontend with npm...

opendevin-frontend@0.1.0 start npm run make-i18n && vite --port 3001

opendevin-frontend@0.1.0 make-i18n node scripts/make-i18n-translations.cjs

node:internal/fs/rimraf:202 throw err; ^

Error: EACCES: permission denied, rmdir '/Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar' at rmdirSync (node:fs:1217:11) at _rmdirSync (node:internal/fs/rimraf:235:5) at rimrafSync (node:internal/fs/rimraf:193:7) at node:internal/fs/rimraf:253:9 at Array.forEach () at _rmdirSync (node:internal/fs/rimraf:250:7) at rimrafSync (node:internal/fs/rimraf:193:7) at Object.rmSync (node:fs:1266:10) at Object. (/Users/Jessica/Downloads/OpenDevin/frontend/scripts/make-i18n-translations.cjs:20:6) at Module._compile (node:internal/modules/cjs/loader:1434:14) { errno: -13, code: 'EACCES', syscall: 'rmdir', path: '/Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar' }

Node.js v22.2.0 make: *** [run] Error 1 JessicaComputer:OpenDevin Jessica$ sudo chown jessica /Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar JessicaComputer:OpenDevin Jessica$ sudo chown jessica /Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar JessicaComputer:OpenDevin Jessica$ make run Running the app... Starting backend server... Waiting for the backend to start... Connection to localhost port 3000 [tcp/hbci] succeeded! Backend started successfully. Starting frontend with npm...

opendevin-frontend@0.1.0 start npm run make-i18n && vite --port 3001

opendevin-frontend@0.1.0 make-i18n node scripts/make-i18n-translations.cjs

node:internal/fs/rimraf:202 throw err; ^

Error: EACCES: permission denied, rmdir '/Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar' at rmdirSync (node:fs:1217:11) at _rmdirSync (node:internal/fs/rimraf:235:5) at rimrafSync (node:internal/fs/rimraf:193:7) at node:internal/fs/rimraf:253:9 at Array.forEach () at _rmdirSync (node:internal/fs/rimraf:250:7) at rimrafSync (node:internal/fs/rimraf:193:7) at Object.rmSync (node:fs:1266:10) at Object. (/Users/Jessica/Downloads/OpenDevin/frontend/scripts/make-i18n-translations.cjs:20:6) at Module._compile (node:internal/modules/cjs/loader:1434:14) { errno: -13, code: 'EACCES', syscall: 'rmdir', path: '/Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar' }

Node.js v22.2.0 make: *** [run] Error 1 JessicaComputer:OpenDevin Jessica$ INFO: Started server process [88162] INFO: Waiting for application startup. INFO: Application startup complete. ERROR: [Errno 48] error while attempting to bind on address ('127.0.0.1', 3000): address already in use INFO: Waiting for application shutdown. INFO: Application shutdown complete.

SmartManoj commented 5 months ago

ERROR: [Errno 48] error while attempting to bind on address ('127.0.0.1', 3000): address already in use

To resolve this port issue, run sudo kill -9 $(sudo lsof -t -i:3000)

SmartManoj commented 5 months ago

r '/Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar'

Change permission for this too.

JessChud commented 5 months ago

JessicaComputer:OpenDevin Jessica$ sudo chown jessica /Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar JessicaComputer:OpenDevin Jessica$ make run Running the app... Starting backend server... Waiting for the backend to start... INFO: Started server process [91182] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:3000 (Press CTRL+C to quit) Connection to localhost port 3000 [tcp/hbci] succeeded! Backend started successfully. Starting frontend with npm...

opendevin-frontend@0.1.0 start npm run make-i18n && vite --port 3001

opendevin-frontend@0.1.0 make-i18n node scripts/make-i18n-translations.cjs

node:internal/fs/rimraf:202 throw err; ^

Error: EACCES: permission denied, rmdir '/Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar' at rmdirSync (node:fs:1217:11) at _rmdirSync (node:internal/fs/rimraf:235:5) at rimrafSync (node:internal/fs/rimraf:193:7) at node:internal/fs/rimraf:253:9 at Array.forEach () at _rmdirSync (node:internal/fs/rimraf:250:7) at rimrafSync (node:internal/fs/rimraf:193:7) at Object.rmSync (node:fs:1266:10) at Object. (/Users/Jessica/Downloads/OpenDevin/frontend/scripts/make-i18n-translations.cjs:20:6) at Module._compile (node:internal/modules/cjs/loader:1434:14) { errno: -13, code: 'EACCES', syscall: 'rmdir', path: '/Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar' }

Node.js v22.2.0 make: *** [run] Error 1

SmartManoj commented 5 months ago

sudo chown -R jessica:staff /Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar

It will change the ownership of its contents recursively:

JessChud commented 5 months ago

JessicaComputer:OpenDevin Jessica$ sudo chown -R jessica:staff /Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar JessicaComputer:OpenDevin Jessica$ make run Running the app... Starting backend server... Waiting for the backend to start... Connection to localhost port 3000 [tcp/hbci] succeeded! Backend started successfully. Starting frontend with npm...

opendevin-frontend@0.1.0 start npm run make-i18n && vite --port 3001

opendevin-frontend@0.1.0 make-i18n node scripts/make-i18n-translations.cjs

node:internal/fs/rimraf:202 throw err; ^

Error: EACCES: permission denied, rmdir '/Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar' at rmdirSync (node:fs:1217:11) at _rmdirSync (node:internal/fs/rimraf:235:5) at rimrafSync (node:internal/fs/rimraf:193:7) at node:internal/fs/rimraf:253:9 at Array.forEach () at _rmdirSync (node:internal/fs/rimraf:250:7) at rimrafSync (node:internal/fs/rimraf:193:7) at Object.rmSync (node:fs:1266:10) at Object. (/Users/Jessica/Downloads/OpenDevin/frontend/scripts/make-i18n-translations.cjs:20:6) at Module._compile (node:internal/modules/cjs/loader:1434:14) { errno: -13, code: 'EACCES', syscall: 'rmdir', path: '/Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar' }

Node.js v22.2.0 make: *** [run] Error 1 JessicaComputer:OpenDevin Jessica$ INFO: Started server process [92156] INFO: Waiting for application startup. INFO: Application startup complete. ERROR: [Errno 48] error while attempting to bind on address ('127.0.0.1', 3000): address already in use INFO: Waiting for application shutdown. INFO: Application shutdown complete. sudo kill -9 $(sudo lsof -t -i:3000) JessicaComputer:OpenDevin Jessica$

SmartManoj commented 5 months ago

To change permissions

sudo chmod -R u+rwx /Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar

and run make start-frontend

JessChud commented 5 months ago

JessicaComputer:OpenDevin Jessica$ sudo chown -R jessica:staff /Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar JessicaComputer:OpenDevin Jessica$ make run Running the app... Starting backend server... Waiting for the backend to start... Connection to localhost port 3000 [tcp/hbci] succeeded! Backend started successfully. Starting frontend with npm...

opendevin-frontend@0.1.0 start npm run make-i18n && vite --port 3001

opendevin-frontend@0.1.0 make-i18n node scripts/make-i18n-translations.cjs

node:internal/fs/rimraf:202 throw err; ^

Error: EACCES: permission denied, rmdir '/Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar' at rmdirSync (node:fs:1217:11) at _rmdirSync (node:internal/fs/rimraf:235:5) at rimrafSync (node:internal/fs/rimraf:193:7) at node:internal/fs/rimraf:253:9 at Array.forEach () at _rmdirSync (node:internal/fs/rimraf:250:7) at rimrafSync (node:internal/fs/rimraf:193:7) at Object.rmSync (node:fs:1266:10) at Object. (/Users/Jessica/Downloads/OpenDevin/frontend/scripts/make-i18n-translations.cjs:20:6) at Module._compile (node:internal/modules/cjs/loader:1434:14) { errno: -13, code: 'EACCES', syscall: 'rmdir', path: '/Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar' }

Node.js v22.2.0 make: *** [run] Error 1 JessicaComputer:OpenDevin Jessica$ INFO: Started server process [93816] INFO: Waiting for application startup. INFO: Application startup complete. ERROR: [Errno 48] error while attempting to bind on address ('127.0.0.1', 3000): address already in use INFO: Waiting for application shutdown. INFO: Application shutdown complete.

JessicaComputer:OpenDevin Jessica$ sudo chmod -R u+rwx /Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar Password: JessicaComputer:OpenDevin Jessica$ make start-frontend Starting frontend...

opendevin-frontend@0.1.0 start npm run make-i18n && vite

opendevin-frontend@0.1.0 make-i18n node scripts/make-i18n-translations.cjs

node:internal/fs/rimraf:202 throw err; ^

Error: EACCES: permission denied, rmdir '/Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar' at rmdirSync (node:fs:1217:11) at _rmdirSync (node:internal/fs/rimraf:235:5) at rimrafSync (node:internal/fs/rimraf:193:7) at node:internal/fs/rimraf:253:9 at Array.forEach () at _rmdirSync (node:internal/fs/rimraf:250:7) at rimrafSync (node:internal/fs/rimraf:193:7) at Object.rmSync (node:fs:1266:10) at Object. (/Users/Jessica/Downloads/OpenDevin/frontend/scripts/make-i18n-translations.cjs:20:6) at Module._compile (node:internal/modules/cjs/loader:1434:14) { errno: -13, code: 'EACCES', syscall: 'rmdir', path: '/Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar' }

Node.js v22.2.0 make: *** [start-frontend] Error 1 JessicaComputer:OpenDevin Jessica$

SmartManoj commented 5 months ago

To check the ownership and permissions:

ls -lR /Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar

JessChud commented 5 months ago

Thanks! It says this -- Jessicas-Computer:opendevin Jessica$ ls -lR /Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar total 8 -rwxr--r-- 1 Jessica staff 1644 May 30 13:52 translation.json

SmartManoj commented 5 months ago

Check if any process is locking the file by running lsof {path}

JessChud commented 5 months ago

Here's the output, let me know if I'm not running it correctly:

Jessicas-Computer:opendevin Jessica$ lsof /Users/Jessica/Downloads/OpenDevin/frontend/public/locales/ar Jessicas-Computer:opendevin Jessica$

SmartManoj commented 5 months ago

Could you check make start-frontend again? Also, the backend is installed, check your previous issue https://github.com/OpenDevin/OpenDevin/issues/2140#issuecomment-2139354503

JessChud commented 5 months ago

I TRIED TO RUN THE COMMAND AGAIN FOR RUNNING EVAL AND AM NOW EXPERIENCING THIS ISSUE:

Jessicas-Computer:opendevin Jessica$ sudo evaluation/swe_bench/scripts/run_infer.sh eval_gpt35_0125_preview CodeActAgent 1 AGENT: CodeActAgent AGENT_VERSION: v1.4 MODEL_CONFIG: eval_gpt35_0125_preview EVAL_LIMIT: 1 00:09:45 - opendevin.core.config:INFO: config.py:431 - Loading llm config from eval_gpt35_0125_preview 00:09:45 - opendevin:INFO: run_infer.py:330 - Config for evaluation: AppConfig(llm=LLMConfig(model='gpt-3.5-turbo-0125', api_key='**', base_url=None, api_version=None, embedding_model='openai', embedding_base_url=None, embedding_deployment_name=None, aws_access_key_id='**', aws_secret_access_key='**', aws_region_name=None, num_retries=5, retry_min_wait=3, retry_max_wait=60, timeout=None, max_chars=5000000, temperature=0, top_p=0.5, custom_llm_provider=None, max_input_tokens=None, max_output_tokens=None), agent=AgentConfig(name='CodeActAgent', memory_enabled=False, memory_max_threads=2), runtime='server', file_store='memory', file_store_path='/tmp/file_store', workspace_base='/Users/Jessica/Downloads/OpenDevin/OpenDevin', workspace_mount_path='/Users/Jessica/Downloads/OpenDevin/OpenDevin', workspace_mount_path_in_sandbox='/workspace', workspace_mount_rewrite=None, cache_dir='/tmp/cache', sandbox_container_image='ghcr.io/opendevin/sandbox:main', run_as_devin=True, max_iterations=100, e2b_api_key='**', sandbox_type='ssh', use_host_network=False, ssh_hostname='localhost', disable_color=False, sandbox_user_id=0, sandbox_timeout=120, github_token='**', jwt_secret='15f96ff117dd42fbb90014fc18b779fd', debug=False, enable_auto_lint=False 00:09:45 - opendevin:INFO: run_infer.py:353 - Using evaluation output directory: evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4 00:09:45 - opendevin:INFO: run_infer.py:366 - Metadata: {'agent_class': 'CodeActAgent', 'model_name': 'gpt-3.5-turbo-0125', 'max_iterations': 50, 'eval_output_dir': 'evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4', 'start_time': '2024-06-02 00:09:45', 'git_commit': '6ff50ed369163592041fdda5a7e9702ce79a17cc'} 00:09:45 - opendevin:INFO: run_infer.py:374 - Limiting evaluation to first 1 instances. 00:09:45 - opendevin:INFO: run_infer.py:378 - Writing evaluation output to evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4/output.jsonl 00:09:45 - opendevin:WARNING: run_infer.py:385 - Output file evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4/output.jsonl already exists. Loaded 0 finished instances. 00:09:45 - opendevin:INFO: run_infer.py:390 - Evaluation started with Agent CodeActAgent, model gpt-3.5-turbo-0125, max iterations 50. 00:09:45 - opendevin:INFO: run_infer.py:406 - Finished instances: 0, Remaining instances: 1 0%| | 0/1 [00:00<?, ?it/s]00:09:45 - opendevin:INFO: run_infer.py:427 - Using 8 workers for evaluation. 00:09:45 - opendevin:INFO: run_infer.py:431 - Skipping workspace mount: True 00:09:57 - opendevin:INFO: run_infer.py:214 - Starting evaluation for instance django__django-15202. Hint: run "tail -f evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4/logs/instance_djangodjango-15202.log" to see live logs in a seperate shell 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:13<00:00, 13.53s/it]ERROR:concurrent.futures:exception calling callback for <Future at 0x138c43250 state=finished raised Exception> concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 234, in process_instance sandbox = SWEBenchSSHBox.get_box_for_instance( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/swe_env_box.py", line 96, in get_box_for_instance sandbox = cls( ^^^^ File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/swe_env_box.py", line 41, in init super().init(container_image, timeout, sid) File "/Users/Jessica/Downloads/OpenDevin/opendevin/runtime/docker/ssh_box.py", line 255, in init__ self.setup_user() File "/Users/Jessica/Downloads/OpenDevin/opendevin/runtime/docker/ssh_box.py", line 315, in setup_user raise Exception(f'Failed to create opendevin user in sandbox: {logs}') Exception: Failed to create opendevin user in sandbox: b'useradd: UID 0 is not unique\n' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks callback(self) File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress output = future.result() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception Exception: Failed to create opendevin user in sandbox: b'useradd: UID 0 is not unique\n' ERROR:root: File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 452, in future.result() File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks callback(self) File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress output = future.result() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result raise self._exception

ERROR:root:<class 'Exception'>: Failed to create opendevin user in sandbox: b'useradd: UID 0 is not unique\n' 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:16<00:00, 16.20s/it] Exception ignored in: <function _ExecutorManagerThread.init..weakref_cb at 0x138c4e2a0> Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 308, in weakref_cb AttributeError: 'NoneType' object has no attribute 'util'

WHEN I RUN THE FRONTEND AND CLICK THE http://localhost:3001/ I GET THIS:

WHAT WOULD YOU SUGGEST TO GET THIS WORKING?

enyst commented 5 months ago

@JessChud please note, in the last log:

run_as_devin=True

Can you set this to false? There's a comment above about it, export it or set in config.toml. As far as I know, it needs to be false (so it runs as root) for evals.

JessChud commented 5 months ago

Tried to but for some reason isn't updating...

Jessicas-Computer:opendevin Jessica$ export RUN_AS_DEVIN=false Jessicas-Computer:opendevin Jessica$ sudo evaluation/swe_bench/scripts/run_infer.sh eval_gpt35_0125_preview CodeActAgent 1 AGENT: CodeActAgent AGENT_VERSION: v1.4 MODEL_CONFIG: eval_gpt35_0125_preview EVAL_LIMIT: 1 01:09:41 - opendevin.core.config:INFO: config.py:431 - Loading llm config from eval_gpt35_0125_preview 01:09:41 - opendevin:INFO: run_infer.py:330 - Config for evaluation: AppConfig(llm=LLMConfig(model='gpt-3.5-turbo-0125', api_key='**', base_url=None, api_version=None, embedding_model='openai', embedding_base_url=None, embedding_deployment_name=None, aws_access_key_id='**', aws_secret_access_key='**', aws_region_name=None, num_retries=5, retry_min_wait=3, retry_max_wait=60, timeout=None, max_chars=5000000, temperature=0, top_p=0.5, custom_llm_provider=None, max_input_tokens=None, max_output_tokens=None), agent=AgentConfig(name='CodeActAgent', memory_enabled=False, memory_max_threads=2), runtime='server', file_store='memory', file_store_path='/tmp/file_store', workspace_base='/Users/Jessica/Downloads/OpenDevin/opendevin', workspace_mount_path='/Users/Jessica/Downloads/OpenDevin/opendevin', workspace_mount_path_in_sandbox='/workspace', workspace_mount_rewrite=None, cache_dir='/tmp/cache', sandbox_container_image='ghcr.io/opendevin/sandbox:main', run_as_devin=True, max_iterations=100, e2b_api_key='**', sandbox_type='ssh', use_host_network=False, ssh_hostname='localhost', disable_color=False, sandbox_user_id=0, sandbox_timeout=120, github_token='**', jwt_secret='61c3514eff094811b7d26c3f3c3b25df', debug=False, enable_auto_lint=False 01:09:41 - opendevin:INFO: run_infer.py:353 - Using evaluation output directory: evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4 01:09:41 - opendevin:INFO: run_infer.py:366 - Metadata: {'agent_class': 'CodeActAgent', 'model_name': 'gpt-3.5-turbo-0125', 'max_iterations': 50, 'eval_output_dir': 'evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4', 'start_time': '2024-06-02 01:09:41', 'git_commit': '6ff50ed369163592041fdda5a7e9702ce79a17cc'} 01:09:41 - opendevin:INFO: run_infer.py:374 - Limiting evaluation to first 1 instances. 01:09:41 - opendevin:INFO: run_infer.py:378 - Writing evaluation output to evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4/output.jsonl 01:09:41 - opendevin:WARNING: run_infer.py:385 - Output file evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4/output.jsonl already exists. Loaded 0 finished instances. 01:09:41 - opendevin:INFO: run_infer.py:390 - Evaluation started with Agent CodeActAgent, model gpt-3.5-turbo-0125, max iterations 50. 01:09:41 - opendevin:INFO: run_infer.py:406 - Finished instances: 0, Remaining instances: 1 0%| | 0/1 [00:00<?, ?it/s]01:09:41 - opendevin:INFO: run_infer.py:427 - Using 8 workers for evaluation. 01:09:41 - opendevin:INFO: run_infer.py:431 - Skipping workspace mount: True 01:09:55 - opendevin:INFO: run_infer.py:214 - Starting evaluation for instance django__django-15202. Hint: run "tail -f evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4/logs/instance_djangodjango-15202.log" to see live logs in a seperate shell 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:16<00:00, 16.69s/it]ERROR:concurrent.futures:exception calling callback for <Future at 0x1360c54d0 state=finished raised Exception> concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 234, in process_instance sandbox = SWEBenchSSHBox.get_box_for_instance( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/swe_env_box.py", line 96, in get_box_for_instance sandbox = cls( ^^^^ File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/swe_env_box.py", line 41, in init super().init(container_image, timeout, sid) File "/Users/Jessica/Downloads/OpenDevin/opendevin/runtime/docker/ssh_box.py", line 255, in init__ self.setup_user() File "/Users/Jessica/Downloads/OpenDevin/opendevin/runtime/docker/ssh_box.py", line 315, in setup_user raise Exception(f'Failed to create opendevin user in sandbox: {logs}') Exception: Failed to create opendevin user in sandbox: b'useradd: UID 0 is not unique\n' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks callback(self) File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress output = future.result() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 452, in future.result() File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception Exception: Failed to create opendevin user in sandbox: b'useradd: UID 0 is not unique\n' ERROR:root: File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks callback(self) File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress output = future.result() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 452, in future.result() File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception

ERROR:root:<class 'Exception'>: Failed to create opendevin user in sandbox: b'useradd: UID 0 is not unique\n' 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:20<00:00, 20.52s/it] Exception ignored in: <function _ExecutorManagerThread.init..weakref_cb at 0x1361062a0> Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 308, in weakref_cb AttributeError: 'NoneType' object has no attribute 'util'

enyst commented 5 months ago

Is it in config.toml? Although... it is a bit odd

enyst commented 5 months ago

Second thing, @JessChud please take a look at this too: in config.toml, make sure to set persist_sandbox=false

[core]
...
persist_sandbox=false
run_as_devin=false
...

JessChud commented 5 months ago

Thank you so much! Now i'm experiencing this problem:

Jessicas-Computer:opendevin Jessica$ sudo evaluation/swe_bench/scripts/run_infer.sh eval_gpt35_0125_preview CodeActAgent 1 AGENT: CodeActAgent AGENT_VERSION: v1.4 MODEL_CONFIG: eval_gpt35_0125_preview EVAL_LIMIT: 1 01:25:00 - opendevin.core.config:INFO: config.py:431 - Loading llm config from eval_gpt35_0125_preview 01:25:00 - opendevin:INFO: run_infer.py:330 - Config for evaluation: AppConfig(llm=LLMConfig(model='gpt-3.5-turbo-0125', api_key='**', base_url=None, api_version=None, embedding_model='local', embedding_base_url=None, embedding_deployment_name=None, aws_access_key_id='**', aws_secret_access_key='**', aws_region_name=None, num_retries=5, retry_min_wait=3, retry_max_wait=60, timeout=None, max_chars=5000000, temperature=0.0, top_p=0.5, custom_llm_provider=None, max_input_tokens=None, max_output_tokens=None), agent=AgentConfig(name='CodeActAgent', memory_enabled=False, memory_max_threads=2), runtime='server', file_store='memory', file_store_path='/tmp/file_store', workspace_base='/Users/Jessica/Downloads/OpenDevin/workspace', workspace_mount_path='/Users/Jessica/Downloads/OpenDevin/workspace', workspace_mount_path_in_sandbox='/workspace', workspace_mount_rewrite=None, cache_dir='/tmp/cache', sandbox_container_image='ghcr.io/opendevin/sandbox:latest', run_as_devin=False, max_iterations=100, e2b_api_key='**', sandbox_type='ssh', use_host_network=False, ssh_hostname='localhost', disable_color=False, sandbox_user_id=0, sandbox_timeout=120, github_token='**', jwt_secret='1d51ec3264a84251b8185a6227a4b43f', debug=False, enable_auto_lint=True 01:25:00 - opendevin:INFO: run_infer.py:353 - Using evaluation output directory: evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4 01:25:00 - opendevin:INFO: run_infer.py:366 - Metadata: {'agent_class': 'CodeActAgent', 'model_name': 'gpt-3.5-turbo-0125', 'max_iterations': 50, 'eval_output_dir': 'evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4', 'start_time': '2024-06-02 01:25:00', 'git_commit': '6ff50ed369163592041fdda5a7e9702ce79a17cc'} 01:25:00 - opendevin:INFO: run_infer.py:374 - Limiting evaluation to first 1 instances. 01:25:00 - opendevin:INFO: run_infer.py:378 - Writing evaluation output to evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4/output.jsonl 01:25:00 - opendevin:WARNING: run_infer.py:385 - Output file evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4/output.jsonl already exists. Loaded 0 finished instances. 01:25:00 - opendevin:INFO: run_infer.py:390 - Evaluation started with Agent CodeActAgent, model gpt-3.5-turbo-0125, max iterations 50. 01:25:00 - opendevin:INFO: run_infer.py:406 - Finished instances: 0, Remaining instances: 1 0%| | 0/1 [00:00<?, ?it/s]01:25:00 - opendevin:INFO: run_infer.py:427 - Using 8 workers for evaluation. 01:25:00 - opendevin:INFO: run_infer.py:431 - Skipping workspace mount: True 01:25:15 - opendevin:INFO: run_infer.py:214 - Starting evaluation for instance djangodjango-15202. Hint: run "tail -f evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4/logs/instance_django__django-15202.log" to see live logs in a seperate shell 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:50<00:00, 50.29s/it]ERROR:concurrent.futures:exception calling callback for <Future at 0x13746c9d0 state=finished raised BrowserException> concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 259, in process_instance state: State = asyncio.run( ^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/opendevin/core/main.py", line 92, in main runtime = ServerRuntime(event_stream=event_stream, sandbox=sandbox) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/opendevin/runtime/server/runtime.py", line 35, in init super().init(event_stream, sid, sandbox) File "/Users/Jessica/Downloads/OpenDevin/opendevin/runtime/runtime.py", line 74, in init self.browser = BrowserEnv() ^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/opendevin/runtime/browser/browser_env.py", line 41, in init__ raise BrowserException('Failed to start browser environment.') opendevin.runtime.browser.browser_env.BrowserException: Failed to start browser environment. """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks callback(self) File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress output = future.result() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 452, in future.result() File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception opendevin.runtime.browser.browser_env.BrowserException: Failed to start browser environment. 01:26:05 - opendevin:INFO: browser_env.py:53 - Browser env started.

SmartManoj commented 5 months ago

Why running as sudo? The sandbox only needs root permission.

JessChud commented 5 months ago

Here it is without sudo:

Jessicas-Computer:opendevin Jessica$ evaluation/swe_bench/scripts/run_infer.sh eval_gpt35_0125_preview CodeActAgent 1 AGENT: CodeActAgent AGENT_VERSION: v1.4 MODEL_CONFIG: eval_gpt35_0125_preview EVAL_LIMIT: 1 08:42:21 - opendevin.core.config:INFO: config.py:431 - Loading llm config from eval_gpt35_0125_preview 08:42:21 - opendevin:INFO: run_infer.py:330 - Config for evaluation: AppConfig(llm=LLMConfig(model='gpt-3.5-turbo-0125', api_key='**', base_url=None, api_version=None, embedding_model='local', embedding_base_url=None, embedding_deployment_name=None, aws_access_key_id='**', aws_secret_access_key='**', aws_region_name=None, num_retries=5, retry_min_wait=3, retry_max_wait=60, timeout=None, max_chars=5000000, temperature=0.0, top_p=0.5, custom_llm_provider=None, max_input_tokens=None, max_output_tokens=None), agent=AgentConfig(name='CodeActAgent', memory_enabled=False, memory_max_threads=2), runtime='server', file_store='memory', file_store_path='/tmp/file_store', workspace_base='/Users/Jessica/Downloads/OpenDevin/workspace', workspace_mount_path='/Users/Jessica/Downloads/OpenDevin/workspace', workspace_mount_path_in_sandbox='/workspace', workspace_mount_rewrite=None, cache_dir='/tmp/cache', sandbox_container_image='ghcr.io/opendevin/sandbox:latest', run_as_devin=False, max_iterations=100, e2b_api_key='**', sandbox_type='ssh', use_host_network=False, ssh_hostname='localhost', disable_color=False, sandbox_user_id=502, sandbox_timeout=120, github_token='**', jwt_secret='aeb9f928d61447e0b00feac3976e45e7', debug=False, enable_auto_lint=True 08:42:21 - opendevin:INFO: run_infer.py:353 - Using evaluation output directory: evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4 08:42:21 - opendevin:INFO: run_infer.py:366 - Metadata: {'agent_class': 'CodeActAgent', 'model_name': 'gpt-3.5-turbo-0125', 'max_iterations': 50, 'eval_output_dir': 'evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4', 'start_time': '2024-06-02 08:42:21', 'git_commit': '6ff50ed369163592041fdda5a7e9702ce79a17cc'} 08:42:21 - opendevin:INFO: run_infer.py:374 - Limiting evaluation to first 1 instances. 08:42:21 - opendevin:INFO: run_infer.py:378 - Writing evaluation output to evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4/output.jsonl 08:42:21 - opendevin:WARNING: run_infer.py:385 - Output file evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4/output.jsonl already exists. Loaded 0 finished instances. 08:42:21 - opendevin:INFO: run_infer.py:390 - Evaluation started with Agent CodeActAgent, model gpt-3.5-turbo-0125, max iterations 50. 08:42:21 - opendevin:INFO: run_infer.py:406 - Finished instances: 0, Remaining instances: 1 0%| | 0/1 [00:00<?, ?it/s]08:42:21 - opendevin:INFO: run_infer.py:427 - Using 8 workers for evaluation. 08:42:21 - opendevin:INFO: run_infer.py:431 - Skipping workspace mount: True 08:42:35 - opendevin:INFO: run_infer.py:214 - Starting evaluation for instance djangodjango-15202. Hint: run "tail -f evaluation/evaluation_outputs/outputs/swe_bench/CodeActAgent/gpt-3.5-turbo-0125_maxiter_50_N_v1.4/logs/instance_django__django-15202.log" to see live logs in a seperate shell 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:00<00:00, 60.67s/it]ERROR:concurrent.futures:exception calling callback for <Future at 0x13acdcb50 state=finished raised BrowserException> concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 259, in process_instance state: State = asyncio.run( ^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/opendevin/core/main.py", line 92, in main runtime = ServerRuntime(event_stream=event_stream, sandbox=sandbox) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/opendevin/runtime/server/runtime.py", line 35, in init super().init(event_stream, sid, sandbox) File "/Users/Jessica/Downloads/OpenDevin/opendevin/runtime/runtime.py", line 74, in init self.browser = BrowserEnv() ^^^^^^^^^^^^ File "/Users/Jessica/Downloads/OpenDevin/opendevin/runtime/browser/browser_env.py", line 41, in init__ raise BrowserException('Failed to start browser environment.') opendevin.runtime.browser.browser_env.BrowserException: Failed to start browser environment. """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 340, in _invoke_callbacks callback(self) File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 416, in update_progress output = future.result() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception File "/Users/Jessica/Downloads/OpenDevin/evaluation/swe_bench/run_infer.py", line 452, in future.result() File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 456, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in get_result raise self._exception opendevin.runtime.browser.browser_env.BrowserException: Failed to start browser environment. 08:43:34 - opendevin:INFO: browser_env.py:53 - Browser env started.

SmartManoj commented 5 months ago

https://github.com/OpenDevin/OpenDevin/issues/2150#issuecomment-2141178933 Do you?

JessChud commented 5 months ago

Sorry I don't quite know which part of the thread you're referring to -- do I what?

JessChud commented 5 months ago

I don't need it to browse the internet, I just want to run gpt3.5 turbo 0125 inference on swe-bench and run eval on it.

li-boxuan commented 5 months ago

@JessChud Could you please pull latest main? I notice you are running on an older version.

All-Hands-AI / OpenHands