Farama-Foundation / Minigrid

Simple and easily configurable grid world environments for reinforcement learning
https://minigrid.farama.org/
Other
2.09k stars 604 forks source link

[Proposal] Support Stable Baselines? #343

Closed achouliaras closed 1 year ago

achouliaras commented 1 year ago

Currently, It seems that Minigrid can't work with Stable Baselines3. The state space is in a different format and the env checker produces an error that the environments don't follow the Gym API closely.

It would be nice to provide support for Stable Baselines as they can make Minigrid accessible to even more people.

pseudo-rnd-thoughts commented 1 year ago

Currently Minigrid uses Gymnasium rather than Gym, are you using stable baselines 2.0.0alpha1? This adds supports for gymnasium by default. This is probably the cause of the env checker error.

@BolunDai0216 I realised that in our readme we include a note about training an agent https://github.com/Farama-Foundation/Minigrid#training-an-agent however this will no longer work as the previous environment uses the gym-minigrid. I don't know how much time you have but it would be great to add scripts in cleanrl if costa is happy for training basic agents. Plus we can release pre-trained agents for all of the environments as part of the paper you are writing.

jbloomAus commented 1 year ago

I feel the real work here is in stable baselines using gymnasium rather than gym. Do they plan to transition? (edit: Can't see anything on their roadmap. If enough people want them to use gym rather than gymnasium, it might be worth showing it can be done).

pseudo-rnd-thoughts commented 1 year ago

https://github.com/DLR-RM/stable-baselines3/pull/1327

VineetTambe commented 1 year ago

I am trying to run stable baselines3 with minigrid but I keep on running into errors regrading the structure of the observation space. Was anyone able to get it to run? If so could you point me to some example code? I am trying to run it with PPO and CnnPolicy or MultiInputPolicy

BolunDai0216 commented 1 year ago

@VineetTambe To use SB3 with Minigrid, you would need to create a custom CNN feature extractor see here for an example. Then, you just need to update the policy_kwargs dictionary with the custom feature extractor (see here, this is a MultiInputPolicy, for a CnnPolicy see these two links for the feature extractor and the PPO instance) and pass it to the PPO instance (see here). For more details, you can take a look at the SB3 documentation.

VineetTambe commented 1 year ago

Thanks @BolunDai0216! I will try it out!

BolunDai0216 commented 1 year ago

I will close this issue since there are no additional comments, please feel free to reopen it if anyone has any related questions.