OpenAdaptAI / OpenAdapt

AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
https://www.OpenAdapt.AI
MIT License
850 stars 110 forks source link

[Bug]: `WindowEvent` race condition #816

Open abrichr opened 2 months ago

abrichr commented 2 months ago

Describe the bug

During recording, if the active window changes and the user emits an ActionEvent before a new WindowEvent is received, the resulting segmentation (during replay) will fail.

Edit: this only happens if READ_WINDOW_DATA is True.

WindowEvents are read in https://github.com/OpenAdaptAI/OpenAdapt/blob/main/openadapt/record.py#L713

To Reproduce

TODO

abrichr commented 2 months ago

It's possible that pywinauto is too slow here: https://github.com/OpenAdaptAI/OpenAdapt/blob/main/openadapt/window/_windows.py#L90C1-L98C35.

Via ChatGPT:

You can use the pygetwindow library for a faster solution. Here’s how to get the name, dimensions, and position of the active window:

import pygetwindow as gw

def get_active_window_details() -> dict:
    """
    Get the name, dimensions, and position of the active window.

    Returns:
        dict: A dictionary containing the title, left, top, width, and height of the active window.
    """
    window = gw.getActiveWindow()
    if window is None:
        return {}

    window_details = {
        'title': window.title,
        'left': window.left,
        'top': window.top,
        'width': window.width,
        'height': window.height
    }
    return window_details

# Example usage
if __name__ == "__main__":
    active_window_details = get_active_window_details()
    print(active_window_details)

This method is significantly faster than using pywinauto and should provide the necessary details about the active window. Ensure you have pygetwindow installed:

pip install pygetwindow
abrichr commented 2 months ago

We need window.get_active_window_data (https://github.com/OpenAdaptAI/OpenAdapt/blob/main/openadapt/window/__init__.py#L22) to return more quickly.

Probably the simplest way to approach this is to leave existing functionality as-is, and implement a separate function that uses pygetwindow to return similar data (except for the accessibility data) as quickly as possible.

As part of this, we need a test that calls both functions (the existing one based on pywinauto and the new one based on pygetwindow) and compares the execution time.

abrichr commented 1 month ago

There are two possible solutions:

  1. Separate window_events containing metadata from those containing accessibility data into separate event types. When the accessibility event is received, update the already written action event to point to it (or have the accessibility event point to the action event).

  2. Only set RECORD_WINDOW_DATA = True during evaluation, and not during recording. Then during evaluation, we can wait for the accessibility data to come back. (This seems like a hack.)

Let's go with option number 1. This will require: