Farama-Foundation / miniwob-plusplus

MiniWoB++: a web interaction benchmark for reinforcement learning
https://miniwob.farama.org/
MIT License
287 stars 47 forks source link

Add target variables for 'enter-time' task #85

Closed tudor-berariu closed 1 year ago

tudor-berariu commented 1 year ago

Add a few more target vars.

ppasupat commented 1 year ago

Hi! Let me add a reviewer that can approve the pull request.

ppasupat commented 1 year ago

@pseudo-rnd-thoughts This pull request edits the task HTML files by exposing task parameters as global JavaScript variables. This does not change the task behavior so it should be safe to merge. (Our Python API does not do anything with these variables, though.)

mcobzarenco commented 1 year ago

Oops, we've made a mistake, we didn't mean to create the pull request here yet, but instead into our fork. Also, btw amazing job on maintaining the MiniWoB++ repo!

But given we did create the PR here, a bit of context. We have written policies by hand (using Playwright) to solve all the MiniWoB++ tasks by only interacting with the environment at pixel level (e.g. actions are Move(x, y), LeftClick, EmitText("Hello") etc.). An example:

000

You can use this to train an agent in a supervised fashion end-to-end or initialize policies with "behaviour cloning" for RL.

To do this, we found it easier to add the ground truth to JS variables for some tasks which we read from the env -- hence what's going on in this PR. If you are open to merging this upstream, we'd be very happy to clean up this PR a bit and resubmit it :bow:

pseudo-rnd-thoughts commented 1 year ago

@ppasupat Thanks for confirming, the PR is all your choice of what we do include, or not, etc, you know better about the project than I do

ppasupat commented 1 year ago

@mcobzarenco Thank you for the context. Since this PR is intended to be for a fork, would it be OK if we close the PR for now?

Adding ground truth to JS variables is a great idea though, and we would gladly accept an implementation. One possible improvement would be to make this exposed information uniform across all tasks. The Python API would also benefit from this (currently the fields in the observation are extracted using regular expression, which is not ideal).