All-Hands-AI / OpenHands

🙌 OpenHands: Code Less, Make More
https://all-hands.dev
MIT License
37.22k stars 4.21k forks source link

[Bug]: WebSocket Crashes with ValueError on `latest_event_id` #5151

Open liatweissman opened 2 days ago

liatweissman commented 2 days ago

Is there an existing issue for the same bug?

Describe the bug and reproduction steps

Description:

I'm encountering a critical issue while integrating OpenHands with LiteLLM Proxy to access Amazon Bedrock. The WebSocket crashes repeatedly due to a ValueError when OpenHands attempts to process the query parameter latest_event_id. This seems to occur because the value of latest_event_id is 'undefined', which cannot be converted to an integer.


Steps to Reproduce:

  1. Configure LiteLLM Proxy with the following config.yaml:

    model_list:
      - model_name: bedrock/us.anthropic.claude-3-5-sonnet-20241022-v2:0
        litellm_params:
          model: "us.anthropic.claude-3-5-sonnet-20241022-v2:0"
          aws_region_name: us-east-1
        model_info:
          id: "arn:aws:bedrock:us-east-1:6xxxxxxxxx8:inference-profile/us.anthropic.claude-3-5-sonnet-20241022-v2:0"
    
    general_settings:
      master_key: sk-xxxxxxxxx
  2. Start the LiteLLM Proxy:

    export AWS_ACCESS_KEY_ID=your-access-key-id
    export AWS_SECRET_ACCESS_KEY=your-secret-access-key
    export AWS_REGION=us-east-1
    litellm --debug --config ./config.yaml
  3. Verify that the LiteLLM Proxy works by making the following API call:

    curl --location 'http://0.0.0.0:4000/chat/completions' \
      --header 'Content-Type: application/json' \
      -H 'Authorization: Bearer sk-xxxxxxxx' \
      --data '{
        "model": "us.anthropic.claude-3-5-sonnet-20241022-v2:0",
        "model_id": "arn:aws:bedrock:us-east-1:6xxxxxxxxxx8:inference-profile/us.anthropic.claude-3-5-sonnet-20241022-v2:0",
        "messages": [{"role": "user", "content": "what llm are you"}]
      }'
    • Expected Result: The request completes successfully, and the response is returned by LiteLLM Proxy.
  4. Start OpenHands using the following Docker command:

    docker --debug run -it --rm --pull=always \
      -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.14-nikolaik \
      -e LOG_ALL_EVENTS=true \
      -v /var/run/docker.sock:/var/run/docker.sock \
      -p 3000:3000 \
      --add-host host.docker.internal:host-gateway \
      --name openhands-app \
      docker.all-hands.dev/all-hands-ai/openhands:0.14
  5. Configure OpenHands with the following in the UI:

    • Custom model: litellm_proxy/us.anthropic.claude-3-5-sonnet-20241022-v2:0
    • Base URL: http://0.0.0.0:4000
    • API Key: sk-xxxxxxxx
  6. Start a conversation in OpenHands.


Observed Behavior:

OpenHands enters an infinite loop, throwing the following error repeatedly:

ValueError: invalid literal for int() with base 10: 'undefined'
  File "/app/openhands/server/listen.py", line 347, in websocket_endpoint
    latest_event_id = int(websocket.query_params.get('latest_event_id'))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Environment Details:

OpenHands Installation

Docker command in README

OpenHands Version

0.14

Operating System

MacOS

Logs, Errors, Screenshots, and Additional Context

No response

mamoodi commented 2 days ago

Going to gently ping @enyst as he's quite good with these things...

enyst commented 1 day ago

This looks strange... On one hand, the error is not about Bedrock/LLM, it's from the websocket, for some reason the UI has an event without id here. Maybe @tofarr has a better idea there?

I cannot replicate it, even if I use bedrock and litellm proxy. @liatweissman could you please try to add in the model name 'bedrock' though, or make sure litellm works? I mean, I'd expect the pattern of the model name here to look like "litellm_proxy/bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0".

Also, could you do a hard refresh of the browser, and try again? Normally even if there's a problem with the llm it should start.

liatweissman commented 1 day ago

Hi @enyst Thank you for the suggestions. I tried the following based on your feedback:

  1. Updated the model name in OpenHands to include bedrock, as in:
    litellm_proxy/bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0
  2. Confirmed that the LiteLLM Proxy works flawlessly with the curl command I shared earlier.
  3. Performed a hard refresh in the browser and also tested in incognito mode.
  4. Upgraded LiteLLM to the latest version (1.52.12).

Unfortunately, I am still encountering the same ValueError related to latest_event_id. The WebSocket session starts but fails due to the undefined value for latest_event_id.

Could you suggest any additional steps or debugging options I could try? Also, is there any more information you need from my side to help replicate this issue?

Let me know how I can assist further in diagnosing or resolving this.

enyst commented 1 day ago

Thank you. Can you please try to run openhands:main instead of openhands:0.14?

liatweissman commented 1 day ago

I already tried running openhands:main instead of openhands:0.14, but unfortunately, I am still encountering the same error with the latest_event_id being 'undefined'.

tofarr commented 7 hours ago

I saw this when debugging https://github.com/All-Hands-AI/OpenHands/pull/5056 . The issue is that some events use an ID which is not an integer:

image

If the last event the frontend received before init was one of these events, you get that error. I mitigate the issue by checking in ws-client-provider.tsx:

if (lastEvent && !Number.isNaN(parseInt(lastEvent.id as string, 10))) {

TBH - I think in the long term these events should be refactored - STATUS$STARTING_CONTAINER and such should be a status rather than an id

enyst commented 6 hours ago

Oh, that can explain it. Except if you delete everything and incognito mode... @liatweissman Considering the above, could you please try to clean out the browser's local storage? Then maybe also, make a new project, or at least change the settings and save, to force a new session.