Open Interface
Full Autopilot for All Computers Using LLMs
Open Interface
- Self-drives computers by sending user requests to an LLM backend (GPT-4V, etc) to figure out the required steps.
- Automatically executes the steps by simulating keyboard and mouse input.
- Course-corrects by sending the LLMs a current screenshot of the computer as needed.
Self-Driving Software for All Your Computers
[![macOS](https://img.shields.io/badge/mac%20os-000000?style=for-the-badge&logo=apple&logoColor=white)](https://github.com/AmberSahdev/Open-Interface?tab=readme-ov-file#install)
[![Linux](https://img.shields.io/badge/Linux-FCC624?style=for-the-badge&logo=linux&logoColor=black)](https://github.com/AmberSahdev/Open-Interface?tab=readme-ov-file#install)
[![Windows](https://img.shields.io/badge/Windows-0078D6?style=for-the-badge&logo=windows&logoColor=white)](https://github.com/AmberSahdev/Open-Interface?tab=readme-ov-file#install)
[![Github All Releases](https://img.shields.io/github/downloads/AmberSahdev/Open-Interface/total.svg)]((https://github.com/AmberSahdev/Open-Interface/releases/latest))
![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/AmberSahdev/Open-Interface)
![GitHub Repo stars](https://img.shields.io/github/stars/AmberSahdev/Open-Interface)
![GitHub](https://img.shields.io/github/license/AmberSahdev/Open-Interface)
[![GitHub Latest Release)](https://img.shields.io/github/v/release/AmberSahdev/Open-Interface)](https://github.com/AmberSahdev/Open-Interface/releases/latest)
Demo 💻
["Make me a meal plan in Google Docs"]
More Demos
Install 💽
MacOS
- Download the MacOS binary from the latest release.
- Unzip the file and move Open Interface to the Applications Folder.
Apple Silicon M-Series Macs
-
Open Interface will ask you for Accessibility access to operate your keyboard and mouse for you, and Screen Recording access to take screenshots to assess its progress.
-
In case it doesn't, manually add these permission via System Settings -> Privacy and Security
Intel Macs
-
Launch the app from the Applications folder.
You might face the standard Mac "Open Interface cannot be opened" error.
In that case, press "Cancel".
Then go to System Preferences -> Security and Privacy -> Open Anyway.
-
Open Interface will also need Accessibility access to operate your keyboard and mouse for you, and Screen Recording access to take screenshots to assess its progress.
- Lastly, checkout the Setup section to connect Open Interface to LLMs (OpenAI GPT-4V)
Linux
- Linux binary has been tested on Ubuntu 20.04 so far.
- Download the Linux zip file from the latest release.
-
Extract the executable and run it from the Terminal via
./Open\ Interface
- Checkout the Setup section to connect Open Interface to LLMs (OpenAI GPT-4V)
Windows
- Windows binary has been tested on Windows 10.
- Download the Windows zip file from the latest release.
- Unzip the folder, move the exe to the desired location, double click to open, and voila.
- Checkout the Setup section to connect Open Interface to LLMs (OpenAI GPT-4V)
Setup 🛠️
Set up the OpenAI API key
- Get your OpenAI API key
- Open Interface needs access to GPT-4V to perform user requests. GPT-4V keys can be downloaded from your [OpenAI account](https://platform.openai.com/).
- [Follow the steps here](https://help.openai.com/en/articles/8264644-what-is-prepaid-billing) to add balance to your OpenAI account. To unlock GPT-4V a minimum payment of $5 is needed.
- [More info](https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4)
- Save the API key in Open Interface settings
- In Open Interface, go to the Settings menu on the top right and enter the key you received from OpenAI into the text field like so:
- After setting the API key for the first time you'll need to restart the app.
Optional: Setup a Custom LLM
- Open Interface supports using other OpenAI API style LLMs (such as Llava) as a backend and can be configured easily in the Advanced Settings window.
- Enter the custom base url and model name in the Advanced Settings window and the API key in the Settings window as needed.
- If your LLM does not support an OpenAI style API, you can use a library like [this](https://github.com/BerriAI/litellm) to convert it to one.
- You will need to restart the app after these changes.
Stuff It’s Bad At (For Now) 😬
- Accurate spatial-reasoning and hence clicking buttons.
- Keeping track of itself in tabular contexts, like Excel and Google Sheets, for similar reasons as stated above.
- Navigating complex GUI-rich applications like Counter-Strike, Spotify, Garage Band, etc due to heavy reliance on cursor actions.
Future 🔮
(with better models trained on video walkthroughs like Youtube tutorials)
- "Create a couple of bass samples for me in Garage Band for my latest project."
- "Read this design document for a new feature, edit the code on Github, and submit it for review."
- "Find my friends' music taste from Spotify and create a party playlist for tonight's event."
- "Take the pictures from my Tahoe trip and make a White Lotus type montage in iMovie."
Notes 📝
- Cost: $0.05 - $0.20 per user request.
(This will be much lower in the near future once GPT-4V enables assistant/stateful mode)
- You can interrupt the app anytime by pressing the Stop button, or by dragging your cursor to any of the screen corners.
- Open Interface can only see your primary display when using multiple monitors. Therefore, if the cursor/focus is on a secondary screen, it might keep retrying the same actions as it is unable to see its progress (especially in MacOS with launching Spotlight).
System Diagram 🖼️
+----------------------------------------------------+
| App |
| |
| +-------+ |
| | GUI | |
| +-------+ |
| ^ |
| | |
| v |
| +-----------+ (Screenshot + Goal) +-----------+ |
| | | --------------------> | | |
| | Core | | LLM | |
| | | <-------------------- | (GPT-4V) | |
| +-----------+ (Instructions) +-----------+ |
| | |
| v |
| +-------------+ |
| | Interpreter | |
| +-------------+ |
| | |
| v |
| +-------------+ |
| | Executer | |
| +-------------+ |
+----------------------------------------------------+
Star History ⭐️
Links 🔗