OpenAdaptAI / OpenAdapt

AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
https://www.OpenAdapt.AI
MIT License
829 stars 108 forks source link

Handle similar segments #819

Open abrichr opened 1 month ago

abrichr commented 1 month ago

Feature request

We would like for VisualReplayStrategy to work with segments which are visually identical but in different locations on screen.

Related: https://github.com/OpenAdaptAI/OpenAdapt/pull/679

Motivation

Spreadsheet task

abrichr commented 1 month ago

To begin with, instead of the full timetracking task, let's just simplify to much simpler spreadsheet tasks, e.g. "enter the value 'foo' into cell C3".

abrichr commented 1 month ago

https://huggingface.co/papers/2407.09025

SpreadsheetLLM: Encoding Spreadsheets for Large Language Models