OpenInterpreter / 01

The #1 open-source voice interface for desktop, mobile, and ESP32 chips.
https://01.openinterpreter.com/
GNU Affero General Public License v3.0
4.97k stars 524 forks source link

computer.display.view() crashes 01 #109

Open Maclean-D opened 7 months ago

Maclean-D commented 7 months ago

After 01 calls computer.display.view() it opens a screenshot of the screen, hangs, then crashes. CleanShot 2024-03-22 at 00 23 36@2x ignore the blue play button that's just speechify

Desktop:

Console Output:

poetry run 01 The currently activated Python version 3.12.2 is not supported by the project (>=3.9,<3.12). Trying to find and use a compatible version. Using python3.11 (3.11.7)

Starting...

INFO: Started server process [1839] INFO: Waiting for application startup.

Ready.

INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:10001 (Press CTRL+C to quit) INFO: ('127.0.0.1', 63207) - "WebSocket /" [accepted] INFO: connection open

Press the spacebar to start/stop recording. Press CTRL-C to exit. Recording started... Recording stopped. audio/wav /var/folders/f5/sz18sylj3fs3kc_76_57ttjr0000gn/T/input_20240322002322198717.wav /var/folders/f5/sz18sylj3fs3kc_76_57ttjr0000gn/T/output_20240322002322200512.wav

View my computer display.

Alright, let's have a look at your display.

computer.display.view()

[IPKernelApp] WARNING | Parent appears to have exited, shutting down. [IPKernelApp] WARNING | Parent appears to have exited, shutting down.

    Python Version: 3.11.7
    Pip Version: 24.0
    Open-interpreter Version: cmd: Open Interpreter 0.2.3 New Computer Update

, pkg: 0.2.3 OS Version and Architecture: macOS-14.3.1-arm64-arm-64bit CPU Info: arm RAM Info: 8.00 GB, used: 3.35, free: 0.12

    # Interpreter Info

    Vision: True
    Model: gpt-4-vision-preview
    Function calling: False
    Context window: 110000
    Max tokens: 4096

    Auto run: True
    API base: None
    Offline: False

    Curl output: Not local

    # Messages

    System Message: You are the 01, a screenless executive assistant that can complete any task.

When you execute code, it will be executed on the user's machine. The user has given you full and complete permission to execute any code necessary to complete the task. Run any code to achieve the goal, and if at first you don't succeed, try again and again. You can install new packages. Be concise. Your messages are being read aloud to the user. DO NOT MAKE PLANS. RUN CODE QUICKLY. Try to spread complex tasks over multiple code blocks. Don't try to complex tasks in one go. Manually summarize text.

DON'T TELL THE USER THE METHOD YOU'LL USE, OR MAKE PLANS. ACT LIKE THIS:


user: Are there any concerts in Seattle? assistant: Let me check on that.

computer.browser.search("concerts in Seattle")
Upcoming concerts: Bad Bunny at Neumos...

It looks like there's a Bad Bunny concert at Neumos...

Act like you can just answer any question, then run code (this is hidden from the user) to answer it. THE USER CANNOT SEE CODE BLOCKS. Your responses should be very short, no more than 1-2 sentences long. DO NOT USE MARKDOWN. ONLY WRITE PLAIN TEXT.

TASKS

Help the user manage their tasks. Store the user's tasks in a Python list called tasks. The user's current task list (it might be empty) is: {{ tasks }} When the user completes the current task, you should remove it from the list and read the next item by running tasks = tasks[1:]\ntasks[0]. Then, tell the user what the next task is. When the user tells you about a set of tasks, you should intelligently order tasks, batch similar tasks, and break down large tasks into smaller tasks (for this, you should consult the user and get their permission to break it down). Your goal is to manage the task list as intelligently as possible, to make the user as efficient and non-overwhelmed as possible. They will require a lot of encouragement, support, and kindness. Don't say too much about what's ahead of them— just try to focus them on each step at a time.

After starting a task, you should check in with the user around the estimated completion time to see if the task is completed. To do this, schedule a reminder based on estimated completion time using the function schedule(message="Your message here.", start="8am"), WHICH HAS ALREADY BEEN IMPORTED. YOU DON'T NEED TO IMPORT THE schedule FUNCTION. IT IS AVAILABLE. You'll receive the message at the time you scheduled it. If the user says to monitor something, simply schedule it with an interval of a duration that makes sense for the problem by specifying an interval, like this: schedule(message="Your message here.", interval="5m")

If there are tasks, you should guide the user through their list one task at a time, convincing them to move forward, giving a pep talk if need be.

THE COMPUTER API

The computer module is ALREADY IMPORTED, and can be used for some tasks:

result_string = computer.browser.search(query) # Google search results will be returned from this function as a string
computer.calendar.create_event(title="Meeting", start_date=datetime.datetime.now(), end=datetime.datetime.now() + datetime.timedelta(hours=1), notes="Note", location="") # Creates a calendar event
events_string = computer.calendar.get_events(start_date=datetime.date.today(), end_date=None) # Get events between dates. If end_date is None, only gets events for start_date
computer.calendar.delete_event(event_title="Meeting", start_date=datetime.datetime) # Delete a specific event with a matching title and start date, you may need to get use get_events() to find the
specific event object first
phone_string = computer.contacts.get_phone_number("John Doe")
contact_string = computer.contacts.get_email_address("John Doe")
computer.mail.send("john@email.com", "Meeting Reminder", "Reminder that our meeting is at 3pm today.", ["path/to/attachment.pdf", "path/to/attachment2.pdf"]) # Send an email with a optional attachments
emails_string = computer.mail.get(4, unread=True) # Returns the {number} of unread emails, or all emails if False is passed
unread_num = computer.mail.unread_count() # Returns the number of unread emails
computer.sms.send("555-123-4567", "Hello from the computer!") # Send a text message. MUST be a phone number, so use computer.contacts.get_phone_number frequently here

Do not import the computer module, or any of its sub-modules. They are already imported.

DO NOT use the computer module for ALL tasks. Many tasks can be accomplished via Python, or by pip installing new libraries. Be creative!

GUI CONTROL (RARE)

You are a computer controlling language model. You can control the user's GUI. You may use the computer module to control the user's keyboard and mouse, if the task requires it:

computer.display.view() # Shows you what's on the screen, returns a `pil_image` `in case you need it (rarely). **You almost always want to do this first!**
computer.keyboard.hotkey(" ", "command") # Opens spotlight
computer.keyboard.write("hello")
computer.mouse.click("text onscreen") # This clicks on the UI element with that text. Use this **frequently** and get creative! To click a video, you could pass the *timestamp* (which is usually written
on the thumbnail) into this.
computer.mouse.move("open recent >") # This moves the mouse over the UI element with that text. Many dropdowns will disappear if you click them. You have to hover over items to reveal more.
computer.mouse.click(x=500, y=500) # Use this very, very rarely. It's highly inaccurate
computer.mouse.click(icon="gear icon") # Moves mouse to the icon with that description. Use this very often
computer.mouse.scroll(-10) # Scrolls down. If you don't find some text on screen that you expected to be there, you probably want to do this

You are an image-based AI, you can see images. Clicking text is the most reliable way to use the mouse— for example, clicking a URL's text you see in the URL bar, or some textarea's placeholder text (like "Search" to get into a search bar). If you use plt.show(), the resulting image will be sent to you. However, if you use PIL.Image.show(), the resulting image will NOT be sent to you. It is very important to make sure you are focused on the right application and window. Often, your first command should always be to explicitly switch to the correct application. On Macs, ALWAYS use Spotlight to switch applications, remember to click enter. When searching the web, use query parameters. For example, https://www.amazon.com/s?k=monitor

SKILLS

Try to use the following special functions (or "skills") to complete your goals whenever possible. THESE ARE ALREADY IMPORTED. YOU CAN CALL THEM INSTANTLY.


{{ import sys import os import json import ast from platformdirs import user_data_dir

directory = os.path.join(user_data_dir('01'), 'skills') if not os.path.exists(directory): os.mkdir(directory)

def get_function_info(file_path): with open(file_path, "r") as file: tree = ast.parse(file.read()) functions = [node for node in tree.body if isinstance(node, ast.FunctionDef)] for function in functions: docstring = ast.get_docstring(function) args = [arg.arg for arg in function.args.args] print(f"Function Name: {function.name}") print(f"Arguments: {args}") print(f"Docstring: {docstring}") print("---")

files = os.listdir(directory) for file in files: if file.endswith(".py"): file_path = os.path.join(directory, file) get_function_info(file_path) }}

YOU can add to the above list of skills by defining a python function. The function will be saved as a skill. Search all existing skills by running computer.skills.search(query).

Teach Mode

If the USER says they want to teach you something, exactly write the following, including the markdown code block:


One moment.

computer.skills.new_skill.create()

If you decide to make a skill yourself to help the user, simply define a python function. computer.skills.new_skill.create() is for user-described skills.

USE COMMENTS TO PLAN

IF YOU NEED TO THINK ABOUT A PROBLEM: (such as "Here's the plan:"), WRITE IT IN THE COMMENTS of the code block!


User: What is 432/7? Assistant: Let me think about that.

# Here's the plan:
# 1. Divide the numbers
# 2. Round to 3 digits
print(round(432/7, 3))
61.714

The answer is 61.714.

MANUAL TASKS

Translate things to other languages INSTANTLY and MANUALLY. Don't ever try to use a translation tool. Summarize things manually. DO NOT use a summarizer tool.

CRITICAL NOTES

Code output, despite being sent to you by the user, cannot be seen by the user. You NEED to tell the user about the output of some code, even if it's exact. >>The user does not have a screen.<< ALWAYS REMEMBER: You are running on a device called the O1, where the interface is entirely speech-based. Make your responses to the user VERY short. DO NOT PLAN. BE CONCISE. WRITE CODE TO RUN IT. Try multiple methods before saying the task is impossible. You can do it!

    {'role': 'user', 'type': 'message', 'content': 'View my computer display.\n'}

{'role': 'assistant', 'type': 'message', 'content': "Alright, let's have a look at your display.\n"}

{'role': 'assistant', 'type': 'code', 'format': 'python', 'content': '\ncomputer.display.view()\n'}

{'role': 'computer', 'type': 'console', 'format': 'output', 'content': ''}

{'role': 'computer', 'type': 'image', 'format': 'base64.png', 'content': 'iVBORw0KGgoAAAANSUhEUgAADSAAAAg0CAIAAACcJK5OAAAMP2lDQ1BJQ0MgUHJvZmlsZQAAeJyVVwdYU8kWnluSkJDQAghICb0JIlICSAmhBZDebYQkQCgxBoKKHVlUcC2oWMCGrooodpodsbMo9r5YUFDWxYJdeZMCuu4r35vvmzv//efMf86cO3PvHQDUT3DF4hxUA4B cUb4kJtifkZScwiB1AwwQABV4ACaXlydmRUWFA1gG27+XdzcAImuvOsi0/tn/X4smX5DHAwCJgjiNn8fLhfggAHgVTyzJB4Ao...z2QyztWpkHt/B1IqDl+/emXirxew4BmduWmuI1hiIwLaPlw8KCNzi+3xaV6wxyEGYqG80Ce7qkiqF05GdmRnwuKMqOsDBoyCeQFNqOdr iPMadGzCoSA7bQxKjfqyukmmVIUagEK7M0nYXVPCsIG5rbmGPEJeYrrkAM/+ch+3W8a/cIWIiIEs81GyET3+MYZwUkfi4x912ov5uukQcHwdxWrA0wkuAutCS4BNFiY2HGCwS+hPYhJq7E9g8mygv1dySNCih76o/f98/hdZPnemudygzQAAAABJRU5ErkJggg=='}

{'role': 'computer', 'type': 'console', 'format': 'output', 'content': "Displayed on the user's machine."}

Traceback (most recent call last): File "/Users/mac/Documents/GitHub/01/software/source/server/server.py", line 256, in listener for chunk in interpreter.chat(messages, stream=True, display=True): File "/Users/mac/Library/Caches/pypoetry/virtualenvs/01os-qZIXqCtQ-py3.11/lib/python3.11/site-packages/interpreter/core/core.py", line 196, in _streaming_chat yield from terminal_interface(self, message) File "/Users/mac/Library/Caches/pypoetry/virtualenvs/01os-qZIXqCtQ-py3.11/lib/python3.11/site-packages/interpreter/terminal_interface/terminal_interface.py", line 136, in terminal_interface for chunk in interpreter.chat(message, display=False, stream=True): File "/Users/mac/Library/Caches/pypoetry/virtualenvs/01os-qZIXqCtQ-py3.11/lib/python3.11/site-packages/interpreter/core/core.py", line 235, in _streaming_chat yield from self._respond_and_store() File "/Users/mac/Library/Caches/pypoetry/virtualenvs/01os-qZIXqCtQ-py3.11/lib/python3.11/site-packages/interpreter/core/core.py", line 281, in _respond_and_store for chunk in respond(self): File "/Users/mac/Library/Caches/pypoetry/virtualenvs/01os-qZIXqCtQ-py3.11/lib/python3.11/site-packages/interpreter/core/respond.py", line 69, in respond for chunk in interpreter.llm.run(messages_for_llm): File "/Users/mac/Library/Caches/pypoetry/virtualenvs/01os-qZIXqCtQ-py3.11/lib/python3.11/site-packages/interpreter/core/llm/llm.py", line 97, in run messages = convert_to_openai_messages( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/mac/Library/Caches/pypoetry/virtualenvs/01os-qZIXqCtQ-py3.11/lib/python3.11/site-packages/interpreter/core/llm/utils/convert_to_openai_messages.py", line 173, in convert_to_openai_messages new_message["content"] = new_message["content"].strip() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'list' object has no attribute 'strip' ^Czsh: killed poetry run 01

TashaSkyUp commented 7 months ago

@Maclean-D Thank you very much for including the full stack trace and screenshot.

I found a potential cause of this issue in convert_to_openai_messages.py on line 42.

Error due to new_message["content"] Being a List Instead of a String in Image Processing Section

Brief Description In the process of handling image messages within our message processing logic, we encountered an unexpected error. The objective in this code section is to process different message types, specifically handling image formats and resizing if necessary, before constructing a new message structure.

Error Description The error arises when attempting to call .strip() on new_message["content"], which results in a TypeError. This is because new_message["content"] is structured as a list to accommodate multiple elements (in this case, image URLs with details), whereas .strip() is a method applicable only to strings. This mismatch in data types leads to an execution halt with a TypeError indicating that .strip() cannot be applied to a list.

Location of the Error This issue is located in the final steps of preparing new_message for image-type messages. It occurs just after we ensure that the content size is under the specified limit and immediately before appending the new_message object to the new_messages list for further processing.

Code Snippet

assert content_size_mb < 20, "Content size exceeds 20 MB"

new_message = {
    "role": "user",
    "content": [
        {
            "type": "image_url",
            "image_url": {"url": content, "detail": "low"},
        }
    ],
}

# Error occurs here; new_message["content"] is a list and cannot use .strip()
# new_message["content"] = new_message["content"].strip()

new_messages.append(new_message)

Note: The problematic line has been commented out to highlight the source of the error.

Impact of the Error This TypeError prevents the script from executing as intended, specifically blocking the handling of image messages by incorrectly attempting to strip a list. It's a critical error that affects the flow of message processing, especially in scenarios where image messages are prevalent.

I hope someone who has he time may find the information above helpful in finding a solution!

p4r7h-v commented 7 months ago

managed to reproduce by asking 01 to "look at my screen"

maximgwiazda commented 7 months ago

I believe that I fixed it. I replaced the convert_to_openai_messages.py content with the following code:

https://drive.google.com/file/d/1w5Tt1kNnIYxJxWKL_2R08yhwFv58xEXN/view?usp=sharing