GAIL always raises variable horizon error

mertalbaba commented 1 year ago

Bug description

When trying to train GAIL on Humanoid, always get variable horizon error. I am using the code provided on your documentation, which is written below.

Steps to reproduce

import numpy as np
import gym
from stable_baselines3 import PPO
from stable_baselines3 import SAC
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.ppo import MlpPolicy

from imitation.algorithms.adversarial.gail import GAIL
from imitation.data import rollout
from imitation.data.wrappers import RolloutInfoWrapper
from imitation.rewards.reward_nets import BasicRewardNet
from imitation.util.networks import RunningNorm
from imitation.util.util import make_vec_env

env_name = "Humanoid-v3"
env = gym.make("Humanoid-v3")
expertAgent = SAC("MlpPolicy", env, verbose=1)
expertAgent.learn(10000)

print("Rollouts...")
rollouts = rollout.rollout(
    expertAgent,
    make_vec_env(
        env_name,
        n_envs=4,
        post_wrappers=[lambda env, _: RolloutInfoWrapper(env)],
        rng=rng,
    ),
    rollout.make_sample_until(min_timesteps=1000000, min_episodes=60),
    rng=rng,
)

print("Training...")
venv = make_vec_env(env_name, n_envs=8, rng=rng)
learner = PPO("MlpPolicy", venv, verbose=1)
reward_net = BasicRewardNet(
    venv.observation_space,
    venv.action_space,
    normalize_input_layer=RunningNorm,
)

gail_trainer = GAIL(
    demonstrations=rollouts,
    demo_batch_size=1024,
    gen_replay_buffer_capacity=2048,
    n_disc_updates_per_round=4,
    venv=venv,
    gen_algo=learner,
    reward_net=reward_net,
    allow_variable_horizon=True
)

gail_trainer.train(10000000)
rewards, _ = evaluate_policy(learner, venv, 100, return_episode_rewards=True)
print("Rewards:", rewards)

Environment

Operating system and version: Ubuntu 20.04.5 LTS
Python version: 3.8.10
Output of pip freeze --all: absl-py==1.4.0 cachetools==5.3.0 certifi==2022.12.7 cffi==1.15.1 chai-sacred==0.8.3 charset-normalizer==3.0.1 cloudpickle==2.2.1 colorama==0.4.6 contourpy==1.0.7 cycler==0.11.0 Cython==0.29.33 docopt==0.6.2 fasteners==0.18 filelock==3.9.0 fonttools==4.38.0 gitdb==4.0.10 GitPython==3.1.30 glfw==2.5.5 google-auth==2.16.0 google-auth-oauthlib==0.4.6 grpcio==1.51.1 gym==0.21.0 huggingface-hub==0.12.0 huggingface-sb3==2.2.4 idna==3.4 imageio==2.25.0 imitation==0.3.2 importlib-metadata==4.13.0 joblib==1.2.0 jsonpickle==3.0.1 kiwisolver==1.4.4 Markdown==3.4.1 MarkupSafe==2.1.2 matplotlib==3.6.3 mujoco-py==2.1.2.14 munch==2.5.0 numpy==1.24.1 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 oauthlib==3.2.2 packaging==23.0 pandas==1.5.3 Pillow==9.4.0 pip==20.0.2 pkg-resources==0.0.0 protobuf==3.20.3 py-cpuinfo==9.0.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycparser==2.21 pyglet==1.5.27 pyparsing==3.0.9 python-dateutil==2.8.2 pytz==2022.7.1 PyYAML==6.0 requests==2.28.2 requests-oauthlib==1.3.1 rsa==4.9 scikit-learn==1.2.1 scipy==1.10.0 seals==0.1.5 setuptools==44.0.0 six==1.16.0 smmap==5.0.0 stable-baselines3==1.7.0 tensorboard==2.11.2 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 threadpoolctl==3.1.0 torch==1.13.1+cu116 torch-tb-profiler==0.4.1 torchaudio==0.13.1+cu116 torchvision==0.14.1+cu116 tqdm==4.64.1 typing-extensions==4.4.0 urllib3==1.26.14 wasabi==1.1.1 Werkzeug==2.2.2 wheel==0.34.2 wrapt==1.14.1 zipp==3.11.0

ernestum commented 1 year ago

That is probably because the "Humanoid-v3" environment has a variable horizon. Read more here for why this is an issue You probably want to use the "seals/Humanoid-v0" environment from the seals package instead.

AdamGleave commented 1 year ago

@mertalbaba can you link us to where we provide that code in our docs? If an example is not working we should certainly fix that.

mertalbaba commented 1 year ago

@ernestum Thanks for the solution. It works now. @AdamGleave the example works, since the environment is seals/CartPole-v0. When changing it to Humanoid-v3, I didn’t understand that I need to use seals/Humanoid-v0, therefore it failed.

HumanCompatibleAI / imitation