OpenAdaptAI / OpenAdapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
https://www.OpenAdapt.AI
MIT License
985 stars 136 forks source link

Implement with Anthropic Computer Use #893

Open abrichr opened 2 weeks ago

abrichr commented 2 weeks ago

Feature request

(e.g. like https://github.com/nicholasoxford/computer-use-mac-demo or https://github.com/ashbuilds/computer-use) (i.e. based on https://docs.anthropic.com/en/docs/build-with-claude/computer-use)

We will be using ell, which requires building from this: https://github.com/OpenAdaptAI/OpenAdapt/pull/888

In https://github.com/OpenAdaptAI/OpenAdapt/issues/882 we are implementing a new strategy that just uses their model on the backend to directly predict coordinates. However I think we also want to extend their reference implementation (https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo/computer_use_demo) to embed actions recorded by OpenAdapt, e.g. with a tool.

I'm not sure how this should work exactly. I think the first step is to understand with their code enough to suggest an approach.

Motivation

New paradigm

abrichr commented 1 week ago
image

https://huggingface.co/spaces/orby-osu/UGround