Add Replay ("Policy") performance tests (TaskCompletionRateTest)

abrichr commented 4 months ago

Feature request

We need to extend https://github.com/OpenAdaptAI/OpenAdapt/issues/314 to include some useful tests and generate an automated report.

This involves:

Create recordings of three tasks:

Open a calculator and perform a short calculation
Open a spreadsheet (e.g. https://github.com/OpenAdaptAI/OpenAdapt/blob/cb70f35985eeb579fd3e13b20a9839b10729921d/tests/assets/excel.png), open a time tracking app (e.g. https://clockify.me), copy a week's worth of data from the spreadsheet into the app, and save/submit the data in the app. (e.g. https://www.youtube.com/watch?v=omP11q-o_0I) Alternatively if browser events are not yet available (see https://github.com/OpenAdaptAI/OpenAdapt/pull/744), replicate something similar with two different spreadsheets open simultaneously (one for reading, one for writing).
Open powerpoint and create a short presentation.

Save them as fixtures
Add automated tests to run a replay (with configurable strategy, defaulting to VanillaReplayStrategy) and evaluate the outcome. Outcome evaluation can be implemented with WindowEvent data.
Add a script to log the outcome results to stdout and/or to a file.

Motivation

Scientific rigor and reproducibility.

abrichr commented 4 months ago

@seanmcguire12 your assistance would be greatly appreciated!

abrichr commented 3 months ago

@KrishPatel13 outcome evaluation for web apps will depend on finishing https://github.com/OpenAdaptAI/OpenAdapt/pull/364

abrichr commented 3 months ago

Save a fixture with recording.task_description = "test: calculate 2x3" that is just like the video currently on the website.

Test 1: Run the VanillaReplayStrategy with empty instructions (or give it instructions like replay the recording verbatim). Use openadapt.window to assert that the calculator display area contains the expected value 6.

Test 2: Run the VanillaReplayStrategy with instructions like calculate 9-8+7. Use the same API to assert that the calculator display area contains the expected value 8.

Parameterize the replay strategy and iterate over all of them. Produce a report with the results.

abrichr commented 3 months ago

@seanmcguire12 please submit a PR with your work-in-progress 🙏

OpenAdaptAI / OpenAdapt

Add Replay ("Policy") performance tests (TaskCompletionRateTest) #704

Feature request

Motivation