Adds the ability to use more algorithms from sbx and sb3-contrib (with e.g. MlpPolicy)

Ivan-267 commented 8 months ago

This change adds an additional sb3 wrapper class that uses a single observation space ("obs"), like our CleanRL implementation, without modifying the original wrapper.

Algorithms like ARS from SB3 Contrib and PPO from SBX (currently doesn't have MultiInputPolicy) can be tested easily on any environment that doesn't require multiple observation spaces (which is often the case for the examples) by using this class.

Usage:

In stable_baselines_3_example.py:

First import the SingleObsSpace variant of the env wrapper:

+ from godot_rl.wrappers.sbg_single_obs_wrapper import SBGSingleObsEnv

Then (after the installation of needed packages with pip), import any algorithms to be used:

- from stable_baselines3 import PPO
+ from sb3_contrib import ARS
+ from sbx import TQC, DroQ, SAC, PPO, DQN, TD3, DDPG

The env just needs its class name replaced to:

env = SBGSingleObsEnv(env_path=args.env_path, show_window=args.viz, seed=args.seed, n_parallel=args.n_parallel, speedup=args.speedup)

And then you can use e.g. the SBX PPO, SB3 Contrib ARS or any other algorithm that may not support the MultiInputPolicy:

    model: PPO = PPO("MlpPolicy",
                     env,
                     verbose=2)

Here's a brief try of starting testing with ARS (the env is slightly modified for some experiments and doesn't have the correct obs, but this was just an attempt to start the training, not for testing learning performance):

https://github.com/edbeeching/godot_rl_agents/assets/61947090/b9b1052f-d503-4abc-9a74-b9fc5ecba2f2

Ivan-267 commented 8 months ago

Thanks for the suggestions, I implemented the solutions and updated the return types in the file.

Ivan-267 commented 8 months ago

Thanks for the review. I'll just add the a small change to allow changing the dictionary name from obs to any value, which I found useful while testing CNN usage with the camera example, then I can merge it.

edbeeching / godot_rl_agents

Adds the ability to use more algorithms from sbx and sb3-contrib (with e.g. MlpPolicy) #163

Usage: