Description

The rewards from MiniWoB environments have a time penalty and partial rewards, which might not always be desirable. A reward processor can be used to specify the type of reward to use. For example, get_binary_reward ignores the time penalty and partial rewards, thus yielding the pure task success rate.

Revised the existing reward processors in reward.py. Also added tests and documentation.
Added information about partial reward and other caveats to the environment docstrings.
Fixed any reward bug in the actual environments.

Type of change

Please delete options that are not relevant.

[x] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[x] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[x] This change requires a documentation update

Checklist:

[x] I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have made corresponding changes to the documentation
[x] My changes generate no new warnings
[x] I have added tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes

Farama-Foundation / miniwob-plusplus

Reward processors #73

Description

Type of change

Checklist: