OpenInterpreter / open-interpreter

A natural language interface for computers
http://openinterpreter.com/
GNU Affero General Public License v3.0
51.96k stars 4.58k forks source link

open interpreter crash when using computer.display.view #1128

Open Jpkovas opened 5 months ago

Jpkovas commented 5 months ago

Describe the bug

Every time the interpreter has to pull computer.display.view it crashes

Reproduce

Last login: Mon Mar 25 14:33:59 on ttys000 joniy@Air-de-Joao ~ % interpreter --os

▌ OS Control enabled

Open my music and tell me what do we have to play there

First, I'll need to open the Music application on your computer. Let's use a
spotlight search to open the Music app and then check what is available to
play.

Let's start by pressing the "command" and spacebar keys to open Spotlight,
and then I'll type "Music" to find and open the Music application. After
that, I'll wait about 2 seconds and usecomputer.display.view to see what's
on the screen.

import time

Open Spotlight and type Music to open the app

computer.keyboard.hotkey(" ", "command")
time.sleep(0.5)
computer.keyboard.write("Music")
time.sleep(0.5)
computer.keyboard.press("enter")
time.sleep(2) # wait for the Music app to open

Capture the screen to see what's happening

computer.display.view()

[IPKernelApp] WARNING | Parent appears to have exited, shutting down. [IPKernelApp] WARNING | Parent appears to have exited, shutting down.

    Python Version: 3.11.0
    Pip Version: 24.0
    Open-interpreter Version: cmd: Open Interpreter 0.2.4 New Computer 

Update , pkg: 0.2.4 OS Version and Architecture: macOS-14.4-arm64-arm-64bit CPU Info: arm RAM Info: 8.00 GB, used: 3.47, free: 0.41

    # Interpreter Info

    Vision: True
    Model: gpt-4-vision-preview
    Function calling: False
    Context window: 110000
    Max tokens: 4096

    Auto run: True
    API base: None
    Offline: False

    Curl output: Not local

    # Messages

    System Message: You are Open Interpreter, a world-class programmer that 

can complete any goal by executing code.

When you write code, it will be executed on the user's machine. The user has given you full and complete permission to execute any code necessary to complete the task.

When a user refers to a filename, they're likely referring to an existing file in the directory you're currently executing code in.

In general, try to make plans with as few steps as possible. As for actually executing code to carry out that plan, don't try to do everything in one code block. You should try something, print information about it, then continue from there in tiny, informed steps. You will never get it on the first try, and attempting it in one go will often lead to errors you cant see.

Manually summarize text.

Do not try to write code that attempts the entire task at once, and verify at each step whether or not you're on track.

Computer

You may use the computer Python module to complete tasks:

computer.browser.search(query) # Silently searches Google for the query, returns
result. The user's browser is unaffected. (does not open a browser!)

computer.display.view() # Shows you what's on the screen, returns a `pil_image` 
`in case you need it (rarely). **You almost always want to do this first!**

computer.keyboard.hotkey(" ", "command") # Opens spotlight (very useful)
computer.keyboard.write("hello")

# Use this to click text:
computer.mouse.click("text onscreen") # This clicks on the UI element with that 
text. Use this **frequently** and get creative! To click a video, you could pass
the *timestamp* (which is usually written on the thumbnail) into this.
# Use this to click an icon, button, or other symbol:
computer.mouse.click(icon="gear icon") # Moves mouse to the icon with that 
description. Use this very often.

computer.mouse.move("open recent >") # This moves the mouse over the UI element 
with that text. Many dropdowns will disappear if you click them. You have to 
hover over items to reveal more.
computer.mouse.click(x=500, y=500) # Use this very, very rarely. It's highly 
inaccurate

computer.mouse.scroll(-10) # Scrolls down. If you don't find some text on screen
that you expected to be there, you probably want to do this
x, y = computer.display.center() # Get your bearings

computer.clipboard.view() # Returns contents of clipboard
computer.os.get_selected_text() # Use frequently. If editing text, the user 
often wants this

{{
import platform
if platform.system() == 'Darwin':
        print('''
computer.browser.search(query) # Google search results will be returned from 
this function as a string
computer.files.edit(path_to_file, original_text, replacement_text) # Edit a file
computer.calendar.create_event(title="Meeting", 
start_date=datetime.datetime.now(), end=datetime.datetime.now() + 
datetime.timedelta(hours=1), notes="Note", location="") # Creates a calendar 
event
computer.calendar.get_events(start_date=datetime.date.today(), end_date=None) # 
Get events between dates. If end_date is None, only gets events for start_date
computer.calendar.delete_event(event_title="Meeting", 
start_date=datetime.datetime) # Delete a specific event with a matching title 
and start date, you may need to get use get_events() to find the specific event 
object first
computer.contacts.get_phone_number("John Doe")
computer.contacts.get_email_address("John Doe")
computer.mail.send("john@email.com", "Meeting Reminder", "Reminder that our 
meeting is at 3pm today.", ["path/to/attachment.pdf", 
"path/to/attachment2.pdf"]) # Send an email with a optional attachments
computer.mail.get(4, unread=True) # Returns the {number} of unread emails, or 
all emails if False is passed
computer.mail.unread_count() # Returns the number of unread emails
computer.sms.send("555-123-4567", "Hello from the computer!") # Send a text 
message. MUST be a phone number, so use computer.contacts.get_phone_number 
frequently here
''')
}}

For rare and complex mouse actions, consider using computer vision libraries on the computer.display.view() pil_image to produce a list of coordinates for the mouse to move/drag to.

If the user highlighted text in an editor, then asked you to modify it, they probably want you to keyboard.write over their version of the text.

Tasks are 100% computer-based. DO NOT simply write long messages to the user to complete tasks. You MUST put your text back into the program they're using to deliver your text!

Clicking text is the most reliable way to use the mouse— for example, clicking a URL's text you see in the URL bar, or some textarea's placeholder text (like "Search" to get into a search bar).

Applescript might be best for some tasks.

If you use plt.show(), the resulting image will be sent to you. However, if you use PIL.Image.show(), the resulting image will NOT be sent to you.

It is very important to make sure you are focused on the right application and window. Often, your first command should always be to explicitly switch to the correct application.

When searching the web, use query parameters. For example, https://www.amazon.com/s?k=monitor

Try multiple methods before saying the task is impossible. You can do it!

Critical Routine Procedure for Multi-Step Tasks

Include computer.display.view() after a 2 second delay at the end of every code block to verify your progress, then answer these questions in extreme detail:

  1. Generally, what is happening on-screen?
  2. What is the active app?
  3. What hotkeys does this app support that might get be closer to my goal?
  4. What text areas are active, if any?
  5. What text is selected?
  6. What options could you take next to get closer to your goal?

{{

Add window information

try:

import pywinctl

active_window = pywinctl.getActiveWindow()

if active_window:
    app_info = ""

    if "_appName" in active_window.__dict__:
        app_info += (
            "Active Application: " + active_window.__dict__["_appName"]
        )

    if hasattr(active_window, "title"):
        app_info += "\n" + "Active Window Title: " + active_window.title
    elif "_winTitle" in active_window.__dict__:
        app_info += (
            "\n"
            + "Active Window Title:"
            + active_window.__dict__["_winTitle"]
        )

    if app_info != "":
        print(
            "\n\n# Important Information:\n"
            + app_info
            + "\n(If you need to be in another active application to help 

the user, you need to switch to it.)" )

except:

Non blocking

pass

}}

    {'role': 'user', 'type': 'message', 'content': 'Open my music and tell 

me what do we have to play there'}

{'role': 'assistant', 'type': 'message', 'content': 'First, I\'ll need to open the Music application on your computer. Let\'s use a spotlight search to open the Music app and then check what is available to play.\n\nLet\'s start by pressing the "command" and spacebar keys to open Spotlight, and then I\'ll type "Music" to find and open the Music application. After that, I\'ll wait about 2 seconds and usecomputer.display.view to see what\'s on the screen.\n\n'}

{'role': 'assistant', 'type': 'code', 'format': 'python', 'content': '\nimport time\n\n# Open Spotlight and type Music to open the app\ncomputer.keyboard.hotkey(" ", "command")\ntime.sleep(0.5)\ncomputer.keyboard.write("Music")\ntime.sleep(0.5)\n computer.keyboard.press("enter")\ntime.sleep(2) # wait for the Music app to open\n\n# Capture the screen to see what\'s happening\ncomputer.display.view()\n'}

{'role': 'computer', 'type': 'console', 'format': 'output', 'content': ''}

{'role': 'computer', 'type': 'image', 'format': 'base64.png', 'content': 'iVBORw0KGgoAAAANSUhEUgAADSAAAAg0CAIAAACcJK5OAAAMQGlDQ1BJQ0MgUHJvZmlsZQAAeJyVVwd YU8kWnluSkEBoAQSkhN4EESkBpITQAkjvNkISIJQYA0HFjiwquBZURMCGrooodpodsbMo9r5YUFDWxYJ deZMCuu4r35vvmzv//efMf86cO/fOHQDUTnBEomxUHYAcYZ44OsiPnpiUTCf1AAyQgTrwBKM43FwRMzI yDMAy1P69vLsBEGl71V6q9c/+/1o0ePxcLgBIJMSpvFxuDsQHAcCruSJxHgBE...qDlhkqFTr9U1/OVt n9bwhXECYP12fWQuIcvCYJjMw4oSLWu03MjsdO9UBDP8nV+4YddWfFm+LdPqMAejWoxAOaFQdLFr3q5S YtNtqHytuKP3Ghubrg6++S1AeyOPP+bONX41tJ7uDet4lZpYUtNGsxzX3nh6/soOp3GMgeUJLJfsEbgV w/bK88z/kies3sZzF55FimCQqrBYnhKcsRndNNbcws+fPLIRr7vX1WcRmq8am6uZ41WCTb9XzkgNqaYo 2RjbW5WnPqq++v7/D6EvUR5q3fVGAAAAAElFTkSuQmCC'}

Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.11/bin/interpreter", line 8, in sys.exit(main()) ^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages /interpreter/terminal_interface/start_terminal_interface.py", line 437, in main start_terminal_interface(interpreter) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages /interpreter/terminal_interface/start_terminal_interface.py", line 415, in start_terminalinterface interpreter.chat() File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages /interpreter/core/core.py", line 167, in chat for in self._streaming_chat(message=message, display=display): File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages /interpreter/core/core.py", line 196, in _streaming_chat yield from terminal_interface(self, message) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages /interpreter/terminal_interface/terminal_interface.py", line 136, in terminal_interface for chunk in interpreter.chat(message, display=False, stream=True): File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages /interpreter/core/core.py", line 235, in _streaming_chat yield from self._respond_and_store() File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages /interpreter/core/core.py", line 281, in _respond_and_store for chunk in respond(self): File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages /interpreter/core/respond.py", line 69, in respond for chunk in interpreter.llm.run(messages_for_llm): File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages /interpreter/core/llm/llm.py", line 97, in run messages = convert_to_openai_messages( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages /interpreter/core/llm/utils/convert_to_openai_messages.py", line 173, in convert_to_openai_messages new_message["content"] = new_message["content"].strip() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'list' object has no attribute 'strip' joniy@Air-de-Joao ~ % [IPKernelApp] WARNING | Parent appears to have exited, shutting down. [IPKernelApp] WARNING | Parent appears to have exited, shutting down.

Expected behavior

At least it has to see what is on my screen.

Screenshots

No response

Open Interpreter version

0.2.4

Python version

3.11.8

Operating System name and version

mac os 14.4

Additional context

No response

MikeBirdTech commented 5 months ago

Hey @Jpkovas Sorry to hear that you're having issues. We're working on a fix

MikeBirdTech commented 5 months ago

@Jpkovas Are you sure you're on 0.2.4? Can you please run %info after you launch interpreter to verify?

This should have been resolved in https://github.com/OpenInterpreter/open-interpreter/pull/1117/files

Thanks :)

tstodter commented 5 months ago

I'm also seeing this problem on 0.2.4: image

darinkishore commented 5 months ago

Also seeing on 0.2.4.

Trace:

File
"/Users/darin/interpreter/.venv/lib/python3.11/site-packages/interpreter/core/llm/u
tils/convert_to_openai_messages.py", line 173, in convert_to_openai_messages
    new_message["content"] = new_message["content"].strip()
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'strip'

~/interpreter 34s
.venv ❯ [IPKernelApp] WARNING | Parent appears to have exited, shutting down.
[IPKernelApp] WARNING | Parent appears to have exited, shutting down.

~/interpreter 34s
.venv ❯ interpreter --os

▌ OS Control enabled

> %info
[IPKernelApp] WARNING | Parent appears to have exited, shutting down.
[IPKernelApp] WARNING | Parent appears to have exited, shutting down.

        Python Version: 3.11.7
        Pip Version: 23.2.1
        Open-interpreter Version: cmd: Open Interpreter 0.2.4 New Computer Update
, pkg: 0.2.4
        OS Version and Architecture: macOS-14.2.1-arm64-arm-64bit
        CPU Info: arm
        RAM Info: 16.00 GB, used: 6.88, free: 0.36

        # Interpreter Info

        Vision: True
        Model: gpt-4-vision-preview
        Function calling: False
        Context window: 110000
        Max tokens: 4096

        Auto run: True
        API base: None
        Offline: False

        Curl output: Not local
nate-dryer commented 5 months ago

I'm also seeing the same issue on 0.2.4.

Issue occurs when computer.display.view() is called to capture a screenshot, resulting in interpreter crashing with: AttributeError:'list' object has no attribute 'strip'`

Environment Details:

Error Details

Interpreter Settings

Trace:

File "/Users/nathandryer/.pyenv/versions/3.11.1/envs/open-interpreter/lib/python3.11/site-packages/interpreter/core/llm/utils/convert_to_openai_messages.py", line 173, in convert_to_openai_messages
    new_message["content"] = new_message["content"].strip()
AttributeError: 'list' object has no attribute 'strip'

Let me know if you need anymore info.

fernando-freitas-alves commented 4 months ago

I was also facing the same issue in a Debian with i3wm. Same version: 0.2.4.

It seems like the changes @MikeBirdTech commented are not in that version yet. However, you can simply apply them yourself and see it fixed!

Thanks, @MikeBirdTech! :bow:

grass29 commented 4 months ago

How does one apply these changes?