Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.12k stars 4.15k forks source link

Gym Wrapper produces Tuple instead of Dict for multiple observations (does not work with SB3) #5766

Closed leo2r closed 1 year ago

leo2r commented 2 years ago

Describe the bug I've used Unity and ml-agents to create a custom environment. I then built the exe file (binary file) to use the UnityToGymWrapper to create a Gym instance of the environment. I'm now trying to use stable-baselines3 to train my agent using 'MultiInputPolicy', since the observations are (image1, image2, image3, vector(ie 'direct' features)). However, I get the following error:

raise NotImplementedError(f"{observation_space} observation space is not supported")

When comparing the returned observation from my UnityToGymWrapper env and an example env, I spot this difference:

UnityToGymWrapper env obs:

Tuple(Box([[[0] [0] [0] ... [255] [255] [255]]], (84, 84, 1), uint8), Box([[[0] [0] [0] ... [255] [255] [255]]], (84, 84, 1), uint8), Box([[[0] [0] [0] ... [255] [255] [255]]], (84, 84, 1), uint8), Box([-inf ... -inf], [inf ... inf], (13,), float32))

SB3 env obs:

{'vec': array([0.72354802, 0.18255346, 0.62559858, 0.69045166, 0.03119092]), 'img': array([[[ 6], [222], [204], ..., [237], [182], [209]]], dtype=uint8)}

So it seems like the UnityToGymWrapper env is returning a Tuple as multi observations whereas SB3 (and therefore Gym) want/need a dictionary. Is this the desired way to present multi observations? Seems like the Gym wrapper should create obs in the Gym format? How can I create a work around for this? Thank you :)

To Reproduce Steps to reproduce the behavior:

  1. env_exe =
  2. unity_env = UnityEnvironment(env_exe)
  3. env = UnityToGymWrapper(unity_env, uint8_visual=True, allow_multiple_obs=True)3.
  4. from stable_baselines3 import PPO
  5. model = PPO('MultiInputPolicy', gym_env)6.
  6. Error happens

Environment (please complete the following information):

hanlanyi commented 2 years ago

Is that true that all people work for this project in Unity were fired?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 28 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

miguelalonsojr commented 2 years ago

Is that true that all people work for this project in Unity were fired?

No. There have been changes at Unity that have slowed our development substantially, but we're still alive! :) Thanks for reaching out. I'll have a look at this.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed in the next 30 days if no further activity occurs. Thank you for your contributions.

aha85b commented 7 months ago

I am having this issue, any one got it fixed?

zbwby819 commented 4 weeks ago

Same issue, any updates?