Closed njucckevin closed 4 months ago
The demonstrations in that repository record the raw JavaScript events. Mouse clicks are also recorded as mouse up + mouse down, for example.
In the project I was involved in (Workflow-Guided Exploration), we converted the MiniWoB demonstrations into a graph structure. The method _parse_raw_demo_original
is probably close to what you want (though it probably won't work out of the box; the code is pretty old).
There is also the paper Understanding HTML with Large Language Models who trained a model using the demonstrations, though I don't know where their code is.
In any case, I have created a feature request for the conversion code (#87).
Closing in favor of #87
Question
Hi, I'm confused with the human demonstrations provided in https://github.com/stanfordnlp/miniwob-plusplus-demos. These demonstrations seem mussy, which has dozens of (eg: 20+) state contain mouse up/down and keyboard up/down in one trajectory. Is there any method to get the cleaned or simplified actions, e.g. {'action': click, 'ref': '6'}, {'action': "type", 'ref': '10', "typed_text": "John"}. I want to use these 12k demonstrations to supervised finetuned my own model.
Thanks a lot!