All-Hands-AI / OpenHands

🙌 OpenHands: Code Less, Make More
https://all-hands.dev
MIT License
33.15k stars 3.8k forks source link

[Bug]: browser not working #4239

Open x66ccff opened 2 weeks ago

x66ccff commented 2 weeks ago

Is there an existing issue for the same bug?

Describe the bug

I try to ask the agent Please goto('https://www.whitehouse.gov/about-the-white-house/presidents/') however the browser only become about:blank and the log keep reporting

AgentFinishAction(outputs={'content': 'Too many errors encountered. Task failed.'}, thought='', action='finish')

But finally the agent get the web page by using the IPython

Current OpenHands version

docker run -it --pull=never    \
 -e SANDBOX_RUNTIME_CONTAINER_IMAGE=kk-oh-env  \
   -e SANDBOX_USER_ID=$(id -u)   \
  -e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE  \
-e RUN_AS_OPENHANDS=False \
 -e MAX_ITERATIONS=1000 \
 -e LLM_NUM_RETRIES=20   \
-e LLM_RETRY_MIN_WAIT=30 \
-e LLM_RETRY_MAX_WAIT=700 \
-e DEBUG=True \
-v $WORKSPACE_BASE:/opt/workspace_base     \
-v /var/run/docker.sock:/var/run/docker.sock  \
   -p 3000:3000 \
    --add-host host.docker.internal:host-gateway  \
  --name openhands-app-$(date +%Y%m%d%H%M%S)   \
  ghcr.io/all-hands-ai/openhands:0.9.8

Installation and Configuration

docker run -it --pull=never    \
 -e SANDBOX_RUNTIME_CONTAINER_IMAGE=kk-oh-env  \
   -e SANDBOX_USER_ID=$(id -u)   \
  -e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE  \
-e RUN_AS_OPENHANDS=False \
 -e MAX_ITERATIONS=1000 \
 -e LLM_NUM_RETRIES=20   \
-e LLM_RETRY_MIN_WAIT=30 \
-e LLM_RETRY_MAX_WAIT=700 \
-e DEBUG=True \
-v $WORKSPACE_BASE:/opt/workspace_base     \
-v /var/run/docker.sock:/var/run/docker.sock  \
   -p 3000:3000 \
    --add-host host.docker.internal:host-gateway  \
  --name openhands-app-$(date +%Y%m%d%H%M%S)   \
  ghcr.io/all-hands-ai/openhands:0.9.8

Model and Agent

qwen2.5:72b

Operating System

linux

Reproduction Steps

No response

Logs, Errors, Screenshots, and Additional Context

07:50:33 - openhands:INFO: agent_controller.py:253
USER_ACTION
**MessageAction** (source=EventSource.USER)
CONTENT: Please goto('https://www.whitehouse.gov/about-the-white-house/presidents/')
07:50:33 - openhands:DEBUG: agent_controller.py:273 - [Agent Controller 774e605f-7a98-4b99-b06b-492317a2b1d1] Setting agent(CodeActAgent) state from AgentState.FINISHED to AgentState.RUNNING
07:50:33 - openhands:DEBUG: stream.py:135 - Adding AgentStateChangedObservation id=22 from AGENT
07:50:33 - openhands:DEBUG: stream.py:135 - Adding NullObservation id=23 from USER
07:50:33 - openhands:INFO: agent_controller.py:229
OBSERVATION
AgentStateChangedObservation(content='', agent_state=<AgentState.RUNNING: 'running'>, observation='agent_state_changed')
07:50:33 - openhands:INFO: agent_controller.py:229
OBSERVATION
NullObservation(content='', observation='null')

==============
CodeActAgent LEVEL 0 LOCAL STEP 4 GLOBAL STEP 4

07:50:34 - openhands:DEBUG: logger.py:238 - Logging to /app/logs/llm/24-10-07_07-12/prompt_038.log
07:50:37 - openhands:DEBUG: logger.py:238 - Logging to /app/logs/llm/24-10-07_07-12/response_038.log
07:50:37 - openhands:INFO: llm.py:308 - Input tokens: 4427 | Output tokens: 39

07:50:37 - openhands:DEBUG: stream.py:135 - Adding AgentDelegateAction id=24 from AGENT
07:50:37 - openhands:INFO: agent_controller.py:448
ACTION
AgentDelegateAction(agent='BrowsingAgent', inputs={'task': 'Sure! Let me browse the provided URL.. I should start with: Tell me what is in "https://www.whitehouse.gov/about-the-white-house/presidents/"'}, thought='', action='delegate')
07:50:37 - openhands:WARNING: llm.py:93 - Could not get model info for ollama/kwen2.5:72b:
This model isn't mapped yet. model=ollama/kwen2.5, custom_llm_provider=ollama. Add it here - https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json.
07:50:37 - openhands:INFO: agent_controller.py:364 - [Agent Controller 774e605f-7a98-4b99-b06b-492317a2b1d1]: start delegate, creating agent BrowsingAgent using LLM LLM(model=ollama/kwen2.5:72b, base_url=http://192.168.1.103:11434/)
07:50:37 - openhands:DEBUG: agent_controller.py:273 - [Agent Controller 774e605f-7a98-4b99-b06b-492317a2b1d1-delegate] Setting agent(BrowsingAgent) state from AgentState.LOADING to AgentState.RUNNING
07:50:37 - openhands:DEBUG: stream.py:135 - Adding AgentStateChangedObservation id=25 from AGENT
07:50:37 - openhands:DEBUG: stream.py:135 - Adding NullObservation id=26 from AGENT
07:50:37 - openhands:INFO: agent_controller.py:229
OBSERVATION
AgentStateChangedObservation(content='', agent_state=<AgentState.RUNNING: 'running'>, observation='agent_state_changed')
07:50:37 - openhands:INFO: agent_controller.py:229
OBSERVATION
NullObservation(content='', observation='null')
07:50:37 - openhands:DEBUG: agent_controller.py:456 - [Agent Controller 774e605f-7a98-4b99-b06b-492317a2b1d1] Delegate not none, awaiting...

==============
BrowsingAgent LEVEL 1 LOCAL STEP 0 GLOBAL STEP 5

07:50:37 - openhands:DEBUG: logger.py:238 - Logging to /app/logs/llm/24-10-07_07-12/prompt_039.log
07:50:42 - openhands:DEBUG: logger.py:238 - Logging to /app/logs/llm/24-10-07_07-12/response_039.log
07:50:42 - openhands:WARNING: llm.py:374 - Cost calculation not supported for this model.
07:50:42 - openhands:INFO: llm.py:308 - Input tokens: 833 | Output tokens: 54

07:50:42 - openhands:DEBUG: response_parser.py:29 - To start browsing the provided URL, I will use the `goto` action to navigate to "https://www.whitehouse.gov/about-the-white-house/presidents/".

```python
goto('https://www.whitehouse.gov/about-the-white-house/presidents)```
07:50:42 - openhands:DEBUG: stream.py:135 - Adding BrowseInteractiveAction id=27 from AGENT
07:50:42 - openhands:INFO: agent_controller.py:448
ACTION
**BrowseInteractiveAction**
THOUGHT: To start browsing the provided URL, I will use the `goto` action to navigate to "https://www.whitehouse.gov/about-the-white-house/presidents/".
BROWSER_ACTIONS: python
goto('https://www.whitehouse.gov/about-the-white-house/presidents)
07:50:42 - openhands:DEBUG: agent_controller.py:458 - [Agent Controller 774e605f-7a98-4b99-b06b-492317a2b1d1] Delegate step done
07:50:42 - openhands:DEBUG: agent_controller.py:461 - [Agent Controller 774e605f-7a98-4b99-b06b-492317a2b1d1] Delegate state: AgentState.RUNNING
07:50:42 - openhands:DEBUG: runtime.py:307 - Getting container logs...
07:50:43 - openhands:DEBUG: runtime.py:307 - Getting container logs...
07:50:43 - openhands:INFO: runtime.py:316 -
-----------------------------------Container logs:-----------------------------------
    |07:50:42 - openhands:DEBUG: client.py:365 - Running action:
    |**BrowseInteractiveAction**
    |THOUGHT: To start browsing the provided URL, I will use the `goto` action to navigate to "https://www.whitehouse.gov/about-the-white-house/presidents/".
    |BROWSER_ACTIONS: python
    |goto('https://www.whitehouse.gov/about-the-white-house/presidents)
    |07:50:43 - openhands:DEBUG: client.py:367 - Action output:
    |**BrowserOutputObservation**
    |URL: about:blank
    |Error: True
    |Open pages: ['about:blank']
    |Active page index: 0
    |Last browser action: python
    |goto('https://www.whitehouse.gov/about-the-white-house/presidents)
    |Last browser action error: ValueError: Received an empty action.
    |Focused element bid: 2
    |axTree: {'nodes': [{'nodeId': '4', 'ignored': False, 'role': {'type': 'internalRole', 'value': 'RootWebArea'}, 'chromeRole': {'type': 'internalRole', 'value': 144}, 'name': {'type': 'computedString', 'value': '', 'sources': [{'type': 'relatedElement', 'attribute': 'aria-labelledby'}, {'type': 'attribute', 'attribute': 'aria-label'}, {'type': 'attribute', 'attribute': 'aria-label', 'superseded': True}, {'type': 'relatedElement', 'nativeSource': 'title'}, {'type': 'attribute', 'attribute': 'title', 'superseded': True}]}, 'properties': [{'name': 'focusable', 'value': {'type': 'booleanOrUndefined', 'value': True}}, {'name': 'focused', 'value': {'type': 'booleanOrUndefined', 'value': True}}], 'childIds': ['5'], 'backendDOMNodeId': 2, 'frameId': '78F66B30411C54487A1793BF48FE955B'}, {'nodeId': '5', 'ignored': True, 'ignoredReasons': [{'name': 'uninteresting', 'value': {'type': 'boolean', 'value': True}}], 'role': {'type': 'role', 'value': 'none'}, 'chromeRole': {'type': 'internalRole', 'value': 0}, 'parentId': '4', 'childIds': ['6'], 'backendDOMNodeId': 3}, {'nodeId': '6', 'ignored': False, 'role': {'type': 'role', 'value': 'generic'}, 'chromeRole': {'type': 'internalRole', 'value': 88}, 'name': {'type': 'computedString', 'value': '', 'sources': [{'type': 'relatedElement', 'attribute': 'aria-labelledby'}, {'type': 'attribute', 'attribute': 'aria-label'}, {'type': 'attribute', 'attribute': 'title'}]}, 'properties': [], 'parentId': '5', 'childIds': [], 'backendDOMNodeId': 5, 'browsergym_id': '2'}]}
    |CONTENT:
    |
    |
    |INFO:     172.17.0.1:36776 - "POST /execute_action HTTP/1.1" 200 OK
--------------------------------------------------------------------------------
07:50:43 - openhands:DEBUG: stream.py:135 - Adding BrowserOutputObservation id=28 from AGENT
07:50:43 - openhands:INFO: agent_controller.py:229
OBSERVATION
**BrowserOutputObservation**
URL: about:blank
Error: True
Open pages: ['about:blank']
Active page index: 0
Last browser action: python
goto('https://www.whitehouse.gov/about-the-white-house/presidents)
Last browser action error: ValueError: Received an empty action.
Focused element bid: 2
axTree: {'nodes': [{'nodeId': '4', 'ignored': False, 'role': {'type': 'internalRole', 'value': 'RootWebArea'}, 'chromeRole': {'type': 'internalRole', 'value': 144}, 'name': {'type': 'computedString', 'value': '', 'sources': [{'type': 'relatedElement', 'attribute': 'aria-labelledby'}, {'type': 'attribute', 'attribute': 'aria-label'}, {'type': 'attribute', 'attribute': 'aria-label', 'superseded': True}, {'type': 'relatedElement', 'nativeSource': 'title'}, {'type': 'attribute', 'attribute': 'title', 'superseded': True}]}, 'properties': [{'name': 'focusable', 'value': {'type': 'booleanOrUndefined', 'value': True}}, {'name': 'focused', 'value': {'type': 'booleanOrUndefined', 'value': True}}], 'childIds': ['5'], 'backendDOMNodeId': 2, 'frameId': '78F66B30411C54487A1793BF48FE955B'}, {'nodeId': '5', 'ignored': True, 'ignoredReasons': [{'name': 'uninteresting', 'value': {'type': 'boolean', 'value': True}}], 'role': {'type': 'role', 'value': 'none'}, 'chromeRole': {'type': 'internalRole', 'value': 0}, 'parentId': '4', 'childIds': ['6'], 'backendDOMNodeId': 3}, {'nodeId': '6', 'ignored': False, 'role': {'type': 'role', 'value': 'generic'}, 'chromeRole': {'type': 'internalRole', 'value': 88}, 'name': {'type': 'computedString', 'value': '', 'sources': [{'type': 'relatedElement', 'attribute': 'aria-labelledby'}, {'type': 'attribute', 'attribute': 'aria-label'}, {'type': 'attribute', 'attribute': 'title'}]}, 'properties': [], 'parentId': '5', 'childIds': [], 'backendDOMNodeId': 5, 'browsergym_id': '2'}]}
CONTENT:

....
enyst commented 2 weeks ago

I think we have just solved this bug here: https://github.com/All-Hands-AI/OpenHands/pull/4226

It's too recent and it's not yet part of a release. If you wish, you can:

Please note that the browsing agent is still experimental, and it's possible there are other issues too. As far as I know, we have some plans to revisit it and improve it.