Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.12k stars 4.15k forks source link

Making gym_unity a standard gym package #3955

Closed vwxyzjn closed 2 years ago

vwxyzjn commented 4 years ago

Is your feature request related to a problem? Please describe. The current gym_unity package is not a typical gym extension. It requires the users to download Unity, obtain license, compile binaries, then use the UnityToGymWrapper to convert a UnityEnvironment into a gym environment. The whole procedure is fairly burdensome for researchers who just want to run some experiments instead of creating game environments.

Describe the solution you'd like For a standard gym extension, see https://github.com/maximecb/gym-minigrid. The users should only need to do the following

pip install gym_unity

and run the following file

import gym
import gym_unity
env = gym.make("GridWorld-v0") # gym.make("GridWorldPixels-v0") if use visual
observation = env.reset()
for _ in range(1000):
  env.render()
  action = env.action_space.sample() # your agent here (this takes random actions)
  observation, reward, done, info = env.step(action)
  if done:
    observation = env.reset()
env.close()

This is a generally accepted API and there are many benefits to it.

  1. The environment will be versioned (e.g. GridWorld-v0, GridWorld-v1) such that it helps with the reproducibility if one environment has a major change.
  2. All of the configurations will not appear to the end-user directly. Namely, the configuration should be set using the gym register API, which is more flexible.
    import gym
    import gym_microrts
    from gym.envs.registration import register
    register(
    "GridWorld-v0",
    entry_point='gym_unity.envs:GridWorld',
    kwargs={
        "flatten_branched": True,
        "windows_binary_download_url": "https://unity.com/mlagents/binaries/windows/gridworldv0",
        "mac_binary_download_url": "https://unity.com/mlagents/binaries/mac/gridworldv0",
        "linux_binary_download_url": "https://unity.com/mlagents/binaries/linux/gridworldv0",
    }
    )
    register(
    "GridWorldPixel-v0",
    entry_point='gym_unity.envs:GridWorld',
    kwargs={
        "use_visual": True,
        "uint8_visual": True,
        "flatten_branched": True,
        "windows_binary_download_url": "https://unity.com/mlagents/binaries/windows/gridworldv0",
        "mac_binary_download_url": "https://unity.com/mlagents/binaries/mac/gridworldv0",
        "linux_binary_download_url": "https://unity.com/mlagents/binaries/linux/gridworldv0",
    }
    )
  3. The API will be extremely easy to use for researchers, encouraging more people to install and play around with it. And when needed, they can install Unity to customize their own environments.

Implementation detail

To achieve this simplicity and easiness to use, the gym_unity should handle the download of the binaries itself. A very crude way of doing so is demonstrated above by having a URL and when the user calls gym.make("GridWorld-v0") for the first time, the gym_unity package should automatically download the pre-compiled binaries. Such a procedure can further be fully automated by using CI/CD. As an example, gym-microrts always build the binaries after a commit is made (http://microrts.s3-website-us-east-1.amazonaws.com/microrts/artifacts/)

I see great potential with gym_unity as a convenient replacement for many commonly used games in gym such as CartPole-v0 and standard Mujoco tasks. It would be fantastic if this feature request gets fulfilled.

Thanks.

awjuliani commented 4 years ago

HI @vwxyzjn

Thanks for making this request. This is actually a feature we are in the early stages of putting together right now. We agree that it would be very useful for users like yourself, and we hope to have more to share in the coming months.

yijiezh commented 4 years ago

+1 for the request.

Another benefit brought by using gym.make("") will be video replay.

Without that, is there anyway for for video replay at the moment? @awjuliani

awjuliani commented 4 years ago

Hi @vwxyzjn and @yijiezh

Our next release of ML-Agents (coming this week) will actually include an environment registry. You can read about it on the documentation for our master branch: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Unity-Environment-Registry.md.

yijiezh commented 4 years ago

Thanks @awjuliani ! This looks wonderful.

Will this support gym 'monitor' interface for video replay?

vwxyzjn commented 4 years ago

@yijiezh btw the Monitor issue is also mentioned in #3954 :)

yijiezh commented 4 years ago

Thanks @vwxyzjn I am pretty new to Unity so correct me if I am wrong. I think gym monitor can only be applied when the env was created by gym.make. It's not applied to the envs that created by UnityToGymWrapper?

vwxyzjn commented 4 years ago

As long as the environment implements the env.render(mode='rgb_array'), Monitor can be applied.

yijiezh commented 4 years ago

How can I know if the env implemented the rendering given that the env file is a Unity binary?

vwxyzjn commented 4 years ago

print(env.render(mode='rgb_array'))

bionicles commented 4 years ago

this would make the code simpler, less if env_name in unity_env_names: unity_env = UnityEnvironment(PATH_LOOKUP[env_name])

vincentpierre commented 3 years ago

Issue logged as MLA-1943

Ademord commented 3 years ago

I insist on pushing the gym wrapper to be fully compatible with gym so we can all use the algorithms that are already implemented outside. It takes too much overhead adapting every single one of them TO the UnityEnvironment API.

vwxyzjn commented 3 years ago

Glad to see a lot of feedback in this thread. I'd add a couple more things.

If the gym API is too slow, one thing to consider is to use the vectorize environment API. This is an approach that is being done by procgen, gym-microrts and others.

So using the gym API with SB3, it looks like the following

import gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import VecMonitor, VecVideoRecorder, DummyVecEnv
env = DummyVecEnv([lambda: gym.make("procgen:starpilot")])
# Record the video starting at the first step
env = VecVideoRecorder(env, 'logs/videos/',
                       record_video_trigger=lambda x: x == 0, video_length=100)
# Wrap with a VecMonitor to collect stats and avoid errors
env = VecMonitor(env=env)
model = PPO("MultiInputPolicy", env, verbose=1)
model.learn(10000)

Whereas using the vectorized environment API, it looks like this

- import gym
+ from procgen import ProcgenEnv
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import VecMonitor, VecVideoRecorder, DummyVecEnv
# ProcgenEnv is already vectorized
- env = DummyVecEnv([lambda: gym.make("procgen:starpilot")])
+ env = ProcgenEnv(num_envs=2, env_name='starpilot')
# Record the video starting at the first step
env = VecVideoRecorder(env, 'logs/videos/',
                       record_video_trigger=lambda x: x == 0, video_length=100)
# Wrap with a VecMonitor to collect stats and avoid errors
env = VecMonitor(env=env)
model = PPO("MultiInputPolicy", env, verbose=1)
model.learn(10000)

DQN compatibility

When setting num_envs=1, this vectorize environment would also work with DQN from SB3.

Potential API design

import gym_unity
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import VecMonitor, VecVideoRecorder
# ProcgenEnv is already vectorized
env = gym_unity.VecEnv(num_envs=2, env_name='GridWorldPixels')
# Record the video starting at the first step
env = VecVideoRecorder(env, 'logs/videos/',
                       record_video_trigger=lambda x: x == 0, video_length=100)
# Wrap with a VecMonitor to collect stats and avoid errors
env = VecMonitor(env=env)
model = PPO("MultiInputPolicy", env, verbose=1)
model.learn(10000)
github-actions[bot] commented 1 year ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.