Closed abrichr closed 4 months ago
Send in a series of screenshots to GPT-4 and then ask GPT-4 to describe what happened. Then give it the sequence of actions (in concrete coordinates and keyboard inputs), as well as your proposed modification in natural language.
According to this description the model would have no knowledge of the current state, it would in effect be predicting what the future state will be, then we replay its predictions verbatim
If you already have state representation, maybe add [the current state to the prompt at every time step] --LunjunZhang
Interesting failure mode:
"Consider the actions you've produced (and "
'we have replayed) so far:\n'
'\n'
'```json\n'
"[{'name': 'type', 'text': "
"'<cmd>-<space>', 'canonical_text': ''}, "
"{'name': 'type', 'text': "
"'c-a-l-c-u-l-a-t-o-r', 'canonical_text': "
"''}, {'name': 'type', 'text': '<enter>', "
"'canonical_text': ''}, {'name': 'type', "
"'text': '9-8', 'canonical_text': ''}, "
"{'name': 'type', 'text': '-', "
"'canonical_text': ''}, {'name': 'type', "
"'text': '<enter>', 'canonical_text': ''}, "
"{'name': 'type', 'text': '8', "
"'canonical_text': ''}, {'name': 'type', "
"'text': '=', 'canonical_text': ''}, "
"{'name': 'type', 'text': '8', "
"'canonical_text': ''}, {'name': 'type', "
"'text': '<-m->-a', 'canonical_text': "
"''}]\n"
'```\n'
File "/Users/abrichr/oa/OpenAdapt/openadapt/playback.py", line 69, in play_key_event
keyboard_controller.press(key)
| | -> None
| -> <function Controller.press at 0x106023490>
-> <oa_pynput.keyboard._darwin.Controller object at 0x2d196b880>
File "/Users/abrichr/Library/Caches/pypoetry/virtualenvs/openadapt-VBXg4jpm-py3.10/lib/python3.10/site-packages/oa_pynput/keyboard/_base.py", line 370, in press
raise self.InvalidKeyException(key)
| | -> None
| -> <class 'oa_pynput.keyboard._base.Controller.InvalidKeyException'>
-> <oa_pynput.keyboard._darwin.Controller object at 0x2d196b880>
oa_pynput.keyboard._base.Controller.InvalidKeyException: None
In general, prompting the model with natural language instructions to modify the recording fails.
Recording (with non-keyframes discarded by default to improve write performance; download and open with VLC to watch):
https://github.com/OpenAdaptAI/OpenAdapt/assets/774615/c3101fe6-143d-47cc-bd49-2504db1d4b38
Recording description:
('The user initiates the sequence by activating Spotlight Search using the '
'shortcut command (Command+Space). They then typed "calculator" to search for '
'the Calculator application on a MacOS system, and confirmed the selection by '
'pressing the Enter key, which launched the application.\n'
'\n'
'Subsequent actions involve a series of mouse movements and clicks within the '
'Calculator app:\n'
'\n'
'1. The user moved the mouse cursor to coordinates (927.671875, 188.546875) '
"and clicked there. The first mouse click doesn't specify any operation in "
'the calculator, so its purpose seems like setting focus or a misclick.\n'
'\n'
'2. The next movement was to coordinates (974.6875, 338.75), followed by a '
'mouse click. These actions correspond to clicking the number "6" on the '
'calculator.\n'
'\n'
'3. The cursor was then moved to coordinates (1086.08984375, 236.22265625) '
'and clicked, corresponding to the operator "+" in the calculator.\n'
'\n'
'4. Next, the mouse moved to coordinates (1046.96484375, 320.12109375) and '
'clicked, likely pressing the number "3" on the calculator.\n'
'\n'
'5. Finally, the cursor was relocated to (1078.22265625, 379.83203125) and '
'clicked, which triggers the "=" operation, producing the output "9" on the '
'calculator display as shown in the final screenshots.\n'
'\n'
'Each action is verified by corresponding screenshots that also include '
'temporal stamps, proving a structured and deliberate sequence of tasks, '
'resulting in performing a basic addition operation of "6 + 3" to get a '
'result "9" on the calculator.\n')
2:50 p.m.
VanillaReplayStrategy:
python -m openadapt.replay VanillaReplayStrategy --replay_instructions "calculate 9-8" --record
...
2024-06-04 17:07:28.342 | INFO | openadapt.strategies.vanilla:__del__:104 - action_history=
[{'canonical_text': '', 'name': 'type', 'text': '<cmd>-<space>'},
{'canonical_text': '', 'name': 'type', 'text': 'c-a-l-c-u-l-a-t-o-r'},
{'canonical_text': '', 'name': 'type', 'text': '<enter>'},
{'canonical_text': '', 'name': 'type', 'text': '<enter>'},
{'canonical_text': '', 'name': 'type', 'text': '9-8'}]
https://github.com/OpenAdaptAI/OpenAdapt/assets/774615/cd8ab50c-2a90-4c0f-98fe-080b58bc3271
Addresses https://github.com/OpenAdaptAI/OpenAdapt/issues/700:
VanillaReplayStrategy
invanilla.py
:@LunjunZhang thank you for your suggestions! 🙏