All-Hands-AI / OpenHands

🙌 OpenHands: Code Less, Make More
https://all-hands.dev
MIT License
37.54k stars 4.25k forks source link

[Bug]: Model does not support image upload when using `litellm_proxy/` #4809

Open xingyaoww opened 3 weeks ago

xingyaoww commented 3 weeks ago

Is there an existing issue for the same bug?

Describe the bug and reproduction steps

I got the following error when using model litellm_proxy/claude-3-5-sonnet-20241022 through a LiteLLM proxy. It supposed to support vision inputs.

image

In L345-L348 openhands/llm/llm.py, maybe we should also check for litellm.support_vision for model_name.split('/')[-1].

OpenHands Installation

Docker command in README

OpenHands Version

No response

Operating System

None

Logs, Errors, Screenshots, and Additional Context

No response

github-actions[bot] commented 3 weeks ago

OpenHands started fixing the issue! You can monitor the progress here.

github-actions[bot] commented 3 weeks ago

An attempt was made to automatically fix this issue, but it was unsuccessful. A branch named 'openhands-fix-issue-4809' has been created with the attempted changes. You can view the branch here. Manual intervention may be required.

enyst commented 3 weeks ago

I think a problem we might have here, just as we have for prompt caching, is that there are providers of some well known models (including claude), which don't support one or both of these. I seem to recall that Sonnet is on vertex (...I think? I didn't try it there), and it doesn't support prompt caching.

In theory, the same could be for vision. To clarify, for vision I don't know of a case, but in the future we should probably consider another solution here. Shouldn't litellm take the provider into account when it returns supports_thing ?

xingyaoww commented 3 weeks ago

@enyst yeah litellm supposed to handle it.... until you have a lot of providers that make things tricky 😢 i don't really blame them lol