Essentially you send in realtime to the API a screenshot of the current desktop screen, and Claude answer with the next operations to do (change mouse coordinates, press a button, click on a link, type an URL, drag, etc.) so that Open-Interpreter can do those actions autonomously.
Is your feature request related to a problem? Please describe.
Currently Open-Interpreter does not understand UI well. Letting it use the computer UI for us is still problematic.
Describe the solution you'd like
Please add support for Claude Computer Use API:
https://docs.anthropic.com/en/docs/build-with-claude/computer-use
Essentially you send in realtime to the API a screenshot of the current desktop screen, and Claude answer with the next operations to do (change mouse coordinates, press a button, click on a link, type an URL, drag, etc.) so that Open-Interpreter can do those actions autonomously.
(source: https://x.com/alexalbert__/status/1848743043429810361 )
Describe alternatives you've considered
No response
Additional context
Some videos demo of Claude Computer use API here: https://x.com/rowancheung/status/1848743700702130474