HumanCompatibleAI / imitation

Clean PyTorch implementations of imitation and reward learning algorithms
https://imitation.readthedocs.io/
MIT License
1.26k stars 239 forks source link

Generate rollouts using RetroEnv SonicTheHedgehog-Genesis #596

Closed feliperafael closed 1 year ago

feliperafael commented 1 year ago

I'm trying to apply GAIL using retroEnv "SonicTheHedgehog-Genesis" but I'm getting some errors. Apparently, the env is not recognized. Does anyone have any idea what could be causing this?

below the code I'm trying to do to generate the rollouts and the error I'm getting

from stable_baselines3 import PPO
from stable_baselines3.ppo import CnnPolicy
import gym
import imitation
from imitation.data import rollout
from imitation.data.wrappers import RolloutInfoWrapper
from imitation.util.util import make_vec_env
from imitation.algorithms.adversarial.gail import GAIL
from imitation.rewards.reward_nets import BasicRewardNet
from imitation.util.networks import RunningNorm
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.vec_env import DummyVecEnv, SubprocVecEnv
import retrowrapper 
from retro_contest.local import make
retrowrapper.set_retro_make( make )

env = make(game="SonicTheHedgehog-Genesis", state="GreenHillZone.Act1", bk2dir="./records")

expert = PPO(
    policy=CnnPolicy,
    env=env,
)
expert.learn(1000)  

rollouts = rollout.rollout(
    expert,
    make_vec_env(
        "SonicTheHedgehog-Genesis.GreenHillZone.Act1",
        n_envs=1 ,
        post_wrappers=[lambda env, _: RolloutInfoWrapper(env)],
    ),
    rollout.make_sample_until(min_timesteps=None, min_episodes=6),
)
---------------------------------------------------------------------------
Error                                     Traceback (most recent call last)
Input In [4], in <cell line: 1>()
      1 rollouts = rollout.rollout(
      2     expert,
----> 3     make_vec_env(
      4         "SonicTheHedgehog-Genesis.GreenHillZone.Act1",
      5         n_envs=1 ,
      6         post_wrappers=[lambda env, _: RolloutInfoWrapper(env)],
      7     ),
      8     rollout.make_sample_until(min_timesteps=None, min_episodes=6),
      9 )

File ~/anaconda3/envs/newRL/lib/python3.8/site-packages/imitation/util/util.py:99, in make_vec_env(env_name, n_envs, seed, parallel, log_dir, max_episode_steps, post_wrappers, env_make_kwargs)
     74 """Makes a vectorized environment.
     75 
     76 Args:
   (...)
     95     A VecEnv initialized with `n_envs` environments.
     96 """
     97 # Resolve the spec outside of the subprocess first, so that it is available to
     98 # subprocesses running `make_env` via automatic pickling.
---> 99 spec = gym.spec(env_name)
    100 env_make_kwargs = env_make_kwargs or {}
    102 def make_env(i, this_seed):
    103     # Previously, we directly called `gym.make(env_name)`, but running
    104     # `imitation.scripts.train_adversarial` within `imitation.scripts.parallel`
   (...)
    109     # work. For more discussion and hypotheses on this issue see PR #160:
    110     # https://github.com/HumanCompatibleAI/imitation/pull/160.

File ~/anaconda3/envs/newRL/lib/python3.8/site-packages/gym/envs/registration.py:239, in spec(id)
    238 def spec(id):
--> 239     return registry.spec(id)

File ~/anaconda3/envs/newRL/lib/python3.8/site-packages/gym/envs/registration.py:151, in EnvRegistry.spec(self, path)
    149 match = env_id_re.search(id)
    150 if not match:
--> 151     raise error.Error(
    152         "Attempted to look up malformed environment ID: {}. (Currently all IDs must be of the form {}.)".format(
    153             id.encode("utf-8"), env_id_re.pattern
    154         )
    155     )
    157 try:
    158     return self.env_specs[id]

Error: Attempted to look up malformed environment ID: b'SonicTheHedgehog-Genesis.GreenHillZone.Act1'. (Currently all IDs must be of the form ^(?:[\w:-]+\/)?([\w:.-]+)-v(\d+)$.)

Environment

AdamGleave commented 1 year ago

"SonicTheHedgehog-Genesis.GreenHillZone.Act1"

This isn't a legal Gym ID that gym.make recognizes. I'm not familiar with Retro, does it add the games to the Gym registry? If so, look up what those IDs are. Otherwise, you can either register the entrypoint yourself using gym.register, or just avoid using env names at all and mod our util.make_vec_env to call retro_contest.local.make

dfilan commented 1 year ago

IMO the main problem here is that rollout.rollout is getting a (h,w,c)-formatted environment, but SB3 training transposes environments internally to (c,h,w) format before feeding them to the policy. One thing that would work:

rollouts = rollout.rollout(
    expert,
    expert.get_env(),
    rollout.make_sample_until(min_timesteps=None, min_episodes=6),
)

The obvious problem is that this doesn't wrap the environment the way you want. One solution would be to just explicitly write something like

new_env = RolloutInfoWrapper(expert.get_env())
rollouts = rollout.rollout(
    expert,
    new_env,
    rollout.make_sample_until(min_timesteps=None, min_episodes=6),
)

The next problem is that maybe if you do this you won't get a vectorized environment? But there's probably some similar way of fixing that.

dfilan commented 1 year ago

I guess not the "main" problem in that it's not causing your error, but I think it would cause an error once you applied Adam's fix.

dfilan commented 1 year ago

You'll also need to feed an rng into rollout.rollout

AdamGleave commented 1 year ago

I agree we should give an example of how to do the rollouts somewhere, maybe in the CNN tutorial.

We should probably also prominently warn people in the docs that SB3 and imitation are using different conventions here.

AdamGleave commented 1 year ago

Doesn't seem like this is an imitation issue, and the usability side of things is already covered in https://github.com/HumanCompatibleAI/imitation/issues/599 so closing in favor of that issue.

Please do feel free to open a new issue if you discover a bug in any of our implementations.