OpenAdaptAI / OpenAdapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
https://www.OpenAdapt.AI
MIT License
883 stars 115 forks source link

Add Replay ("Policy") performance tests (TaskCompletionRateTest) #704

Open abrichr opened 4 months ago

abrichr commented 4 months ago

Feature request

We need to extend https://github.com/OpenAdaptAI/OpenAdapt/issues/314 to include some useful tests and generate an automated report.

This involves:

  1. Open a calculator and perform a short calculation
  2. Open a spreadsheet (e.g. https://github.com/OpenAdaptAI/OpenAdapt/blob/cb70f35985eeb579fd3e13b20a9839b10729921d/tests/assets/excel.png), open a time tracking app (e.g. https://clockify.me), copy a week's worth of data from the spreadsheet into the app, and save/submit the data in the app. (e.g. https://www.youtube.com/watch?v=omP11q-o_0I) Alternatively if browser events are not yet available (see https://github.com/OpenAdaptAI/OpenAdapt/pull/744), replicate something similar with two different spreadsheets open simultaneously (one for reading, one for writing).
  3. Open powerpoint and create a short presentation.

Motivation

Scientific rigor and reproducibility.

abrichr commented 4 months ago

@seanmcguire12 your assistance would be greatly appreciated!

abrichr commented 3 months ago

@KrishPatel13 outcome evaluation for web apps will depend on finishing https://github.com/OpenAdaptAI/OpenAdapt/pull/364

abrichr commented 3 months ago

Save a fixture with recording.task_description = "test: calculate 2x3" that is just like the video currently on the website.

Test 1: Run the VanillaReplayStrategy with empty instructions (or give it instructions like replay the recording verbatim). Use openadapt.window to assert that the calculator display area contains the expected value 6.

Test 2: Run the VanillaReplayStrategy with instructions like calculate 9-8+7. Use the same API to assert that the calculator display area contains the expected value 8.

Parameterize the replay strategy and iterate over all of them. Produce a report with the results.

abrichr commented 3 months ago

@seanmcguire12 please submit a PR with your work-in-progress 🙏