Closed jan1854 closed 7 months ago
Hello, thanks again for the PR =) I'll try to have a look in the coming days.
Btw, because of your good contributions, would you be interested in becoming a SBX maintainer? (so you won't have to fork the repo for fixing a bug/adding a feature)
Sounds awesome, I'd be happy to become an SBX maintainer :)
For built-in multi discrete, I think there are the Atari games? Although we would need to use the ram version at first until CNN are supported by SBX.
Description
closes #19
Addresses #19. Adds support for
MultiDiscrete
andMultiBinary
action spaces toPPO
.Constructs a multivariate categorical distribution through Tensorflow Probability's
Independent
andCategorical
. Note that theCategorical
distribution requires every variable to have the same number of categories. Therefore, I pad the logits to the largest shape across the dimensions (pad by-inf
to ensure that these invalid actions have zero probability).MultiBinary
is handled as a special case ofMultiDiscrete
with two choices per categorical variable.Only one-dimensional action spaces are supported, so using, e.g.,
MultiDiscrete([[2],[3]])
orMultiBinary([2, 3])
will result in an exception (as in stable-baselines3).Testing
I added some tests (
tests/test_space
, similar to the tests in stable-baselines3) that check if there are errors during learning and that the correct exceptions are raised if PPO is used with multi-dimensionalMultiDiscrete
andMultiBinary
action spaces.To check whether there are issues with the learning performance, I compared the performance to stable-baselines3's PPO on
MultiDiscrete
andMultiBinary
action space environments. Since there are no environments with these action spaces in the classic Gym benchmarks, I used a discretized action version of Reacher and a binary action version of Acrobot for testing purposes (see the wrappers below).Test script for
MultiDiscrete
action spaces:Test script for
MultiBinary
action spaces:Results: sbx's and stable-baselines3's PPO have the same learning performance.
Motivation and Context
Types of changes
Checklist:
make format
(required)make check-codestyle
andmake lint
(required)make pytest
andmake type
both pass. (required)make doc
(required)Note: You can run most of the checks using
make commit-checks
.Note: we are using a maximum length of 127 characters per line