OpenAdaptAI / OpenAdapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
https://www.OpenAdapt.AI
MIT License
971 stars 134 forks source link

please add install instructions for linux on the `https://openadapt.ai/#start` page #631

Open hemangjoshi37a opened 6 months ago

hemangjoshi37a commented 6 months ago

Feature request

please add install instructions for linux on the https://openadapt.ai/#start page

Motivation

please add install instructions for linux on the https://openadapt.ai/#start page

abrichr commented 6 months ago

Hi @hemangjoshi37a , thank you for your interest!

Unfortunately at this time, OpenAdapt does not currently support linux for two reasons:

  1. This is non-trivial additional effort for minimal gain. According to https://gs.statcounter.com/os-market-share/desktop/worldwide, Linux currently occupies about 4% of global desktop OS market share.

  2. The input control library we use (pynput) does not support differentiating between "injected" (synthetic) and regular (human) input on Linux (see https://github.com/moses-palmer/pynput/issues/105#issuecomment-412435532). While we do not yet make use of this functionality, we do have plans to in the near future.

That said, we would welcome a Pull Request to add installation instructions for Linux! The relevant repo is at https://github.com/OpenAdaptAI/OpenAdapt.web. Of course, this would require testing the core library on Linux as well.

hemangjoshi37a commented 6 months ago

but you should consider that 96% of the developers use linux who ultimately are going to use openadapt and not your average everyday Karens. LOL also i believe you response is AI generated and not any person is responsible for your response.

metatrot commented 4 months ago

The strength of this project's approach seems to be that it uses SAM and multimodal models to visually parse GUI layouts, instead of relying on OS-specific features like window's accessibility api. Every month I check again to see if I can use it on my OS yet. The feature that I'm really excited about is just having a way to parse a whole-screen screenshot into something that can be described in detail by an LLM model. The automation/interactive parts of the project aren't necessary for me. I just want a super powerful OCR-like tool that works on whole-screen screenshots to give structured output like text, window, buttons and other input field locations.

hemangjoshi37a commented 3 months ago

@metatrot I have tried building something very similar but at very initial stage : https://github.com/microsoft/graphrag

abrichr commented 3 months ago

@metatrot thank you for the information!

I just want a super powerful OCR-like tool that works on whole-screen screenshots to give structured output like text, window, buttons and other input field locations.

Can you please clarify what this would be useful for?

abrichr commented 3 months ago

@hemangjoshi37a I believe you pasted the wrong link

metatrot commented 3 months ago

@abrichr I would use it for the same purposes as this: https://github.com/louis030195/screen-pipe (sadly that project is still mac-only at the moment)