Closed Ivan-267 closed 8 months ago
Thanks for the suggestions, I implemented the solutions and updated the return types in the file.
Thanks for the review. I'll just add the a small change to allow changing the dictionary name from obs to any value, which I found useful while testing CNN usage with the camera example, then I can merge it.
This change adds an additional sb3 wrapper class that uses a single observation space ("obs"), like our CleanRL implementation, without modifying the original wrapper.
Algorithms like ARS from SB3 Contrib and PPO from SBX (currently doesn't have MultiInputPolicy) can be tested easily on any environment that doesn't require multiple observation spaces (which is often the case for the examples) by using this class.
Usage:
In stable_baselines_3_example.py:
First import the SingleObsSpace variant of the env wrapper:
Then (after the installation of needed packages with pip), import any algorithms to be used:
The env just needs its class name replaced to:
And then you can use e.g. the SBX PPO, SB3 Contrib ARS or any other algorithm that may not support the MultiInputPolicy:
Here's a brief try of starting testing with ARS (the env is slightly modified for some experiments and doesn't have the correct obs, but this was just an attempt to start the training, not for testing learning performance):
https://github.com/edbeeching/godot_rl_agents/assets/61947090/b9b1052f-d503-4abc-9a74-b9fc5ecba2f2