a-real-ai / pywinassistant

The first open source Large Action Model generalist Artificial Narrow Intelligence that controls completely human user interfaces by only using natural language. PyWinAssistant utilizes Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models.
MIT License
1.25k stars 175 forks source link

AI decision coordinates: 'x=960, y=995' #11

Open Odysseum04 opened 2 months ago

Odysseum04 commented 2 months ago

Hello dear pywinassistant developpers,

First of all, thank you for your amazing work at pushing forward the limits of AI ! I would like to notify you about an issue I have had with this software.

I managed to make this AI assistant work by using "Razorbob"'s tutorial in both https://github.com/a-real-ai/pywinassistant/pull/9 and https://github.com/a-real-ai/pywinassistant/issues/5

It started great with the command "python ./assistant.py".

The Assistant icon showed up and there were no errors in the terminal.

The issue started when I asked him the following command by text (my first one): "Play "Never gonna give you up" in the application "Spotify""

The bot started, but went to the wrong coordinates:


Assistant listening thread started...
Listening...
Listening...
Listening...
Listening...
Listening...
Listening...
Clicked on the assistant: Whats the action?
Listening...
Listening...
Listening...
Listening...
Listening...
Listening...
Listening...
Listening...
Listening...
Listening...
Listening...
Listening...
Listening...
Processing input: play the song "Never gonna give you up" in the application "Spotify"
Performing action:  play the song "Never gonna give you up" in the application "Spotify"
Selected application: tk
Listening...
Listening...

Keywords: play, song, "Never gonna give you up", application, Spotify

AI decision coordinates: 'x=960, y=995'
Listening...
Listening...
Listening...
Executing command: <function stop_assistant at 0x00000229843300D0>```

I of course tried multiple times but the bot always went at these coordinates: 'x=960, y=995'
Odysseum04 commented 2 months ago

The spotify application was open and in full screen.

henyckma commented 2 months ago

@Odysseum04 @iamgonnagiveyouup It seems that the "App Selector Agent" is selecting the application that has the actual instance of PyWinAssistant which is not being ignored by the agent. To fix this issue, ignore the application that is always selecting by adding strings "Tk.exe", "Tk", "Code", and "Code.exe" to the following lists:

image

To avoid hardcoding I'm implementing the ability to detect automatically the instance in which PyWinAssistant is running, but I'm checking on different ways to get securely into this approach.

jasonc624 commented 2 months ago

Im getting this issue too i have nvidia geforce overlay

henyckma commented 2 months ago

Hi @Odysseum04 @jasonc624 !

I have updated the following files to ignore the Nvidia overlay, Tk and Visual Studio Code (The assistant is intended to not be run by an IDE or a virtual environment).

https://github.com/a-real-ai/pywinassistant/blob/0bf9be45682036c3522d9f4748517bad4e26678b/core/last_app.py#L23

https://github.com/a-real-ai/pywinassistant/blob/0bf9be45682036c3522d9f4748517bad4e26678b/core/window_focus.py#L54

https://github.com/a-real-ai/pywinassistant/blob/0bf9be45682036c3522d9f4748517bad4e26678b/core/topmost_window.py#L18

If you have further issues please let me know.

dot-Justin commented 2 months ago

Hi @Odysseum04 @jasonc624 !

I have updated the following files to ignore the Nvidia overlay, Tk and Visual Studio Code (The assistant is intended to not be run by an IDE or a virtual environment).

https://github.com/a-real-ai/pywinassistant/blob/0bf9be45682036c3522d9f4748517bad4e26678b/core/last_app.py#L23

https://github.com/a-real-ai/pywinassistant/blob/0bf9be45682036c3522d9f4748517bad4e26678b/core/window_focus.py#L54

https://github.com/a-real-ai/pywinassistant/blob/0bf9be45682036c3522d9f4748517bad4e26678b/core/topmost_window.py#L18

If you have further issues please let me know.

amd dvr overlay might be another one to add

henyckma commented 2 months ago

@TheJustinCrow

Thank you for the suggestion, I'll add the following: 'amdow.exe' which is the 'amd dvr overlay'.

Greetings!

Odysseum04 commented 2 months ago

Okay so, first of all, thank you for your patch proposals ! I have been trying them out on my problematic installation.

After cloning the updated github repository, I tried once more and here are the results:

PS C:\Users\cleme\Documents\pywinassistant\core> python ./assistant.py
Assistant listening thread started...
Listening...
Listening...
Listening...
Clicked on the assistant: Whats the action?
Clicked on the assistant: Whats the action?
Listening...
Listening...
Listening...
Listening...
Processing input: open google chrome
Performing action:  open google chrome
Selected application: AI Drone Assistant
Listening...

Keywords: browser, chrome

Listening...
AI decision coordinates: 'x=1872, y=967'
Clicked on the assistant: Whats the action?
Listening...
Listening...
Listening...
Listening...
Listening...
Listening...
Clicked on the assistant: Whats the action?
Listening...
Processing input: Create a long AI essay about an AI Starting to control a Windows computer on Notepad
Performing action:  Create a long AI essay about an AI Starting to control a Windows computer on Notepad
Selected application: tk
Listening...
Listening...

Keywords: AI, essay, control, Windows, computer, Notepad

Listening...
AI decision coordinates: 'x=960, y=995'
Listening...
Listening...
Listening...
Listening...

As you can see, the code "worked" once even though it didn't open chrome, but the second time it went back on "Selected application: tk" and "AI decision coordinates: 'x=960, y=995'".

Odysseum04 commented 2 months ago

To be precise, all my test were done by writing the task, wich brings the "AI Assistant" icon (so basicaly tk) at the topmost of applications

henyckma commented 2 months ago

Hi @Odysseum04 !

PyWinAssistant is having issues discarding it's own ui so it is focusing itself when chat is used.

Temporally instead of using the chat, please try activating the assistant using your own voice with the command "OK computer" as PyWinAssistant automatically fixes the given prompt.

If you want to try written prompts, execute directly driver.py using the examples at the bottom of the assistant function.

I'm working on fixing the chat issue that is focusing on itself. The method of selecting and recognizing apps is not properly ignoring the PyWinAssistant UI.