HCAI compatibilty with parking-v0

pierrekhouryy commented 2 years ago

Describe the bug

Cannot load pre-trained PPO model in script "train_rl.py"

System Specifications

Ubuntu Version: 18.0.4
Gym: 0.19
Stable Baselines 3: 1.2.0
highway_env: latest
HCAI version: latest

Expected Behavior

The model was supposed to load and later generate an expert dataset based on this model.

Actual Behavior

the code is crashing and I'm are getting the following error:

AttributeError: 'Box' object has no attribute 'spaces'

Relevant Screenshots / Outputs

on the following line: rl_algo = PPO("MultiInputPolicy", venv, verbose=1), train_rl.py is crashing and giving this error: AttributeError: 'Box' object has no attribute 'spaces'

Steps to Reproduce the problem

Create a virtual environment
Install the required versions of each library(specified above in System specifications)

Modify two files: First, add the following code to the file: imitation/src/imitation/scripts/config/train_rl.py:

import highway_env
@train_rl_ex.named_config
def parking():
common = dict(env_name="parking-v0")

Then, in the setup.py, change the version of stable_baselines from "stable-baselines3>=1.1.0" to "stable-baselines3==1.2.0", and add the highway_env library. Finally, add the following code to the file imitation/src/imitation/scripts/train_rl.py:

import highway_env
#rl_algo = rl.make_rl_algo(venv)
#rl_algo.set_logger(custom_logger)
#rl_algo.learn(total_timesteps, callback=callback)
from stable_baselines3 import PPO
rl_algo = PPO("MultiInputPolicy", venv, verbose=1)
rl_algo = PPO.load("ppo_parking.zip")

AdamGleave commented 2 years ago

If I understand this correctly, the error is happening on a line of code you've introduced, namely:

rl_algo = PPO("MultiInputPolicy", venv, verbose=1)

Do you have an example of PPO training working outside of imitation? Right now I do not see how this issue is related to our library.

My guess from the error message is a Gym version incompatibility, I think Gym changed the spaces attribute a few versions back. It may also be that parking-v0 is not implementing the standard Gym API fully.

pierrekhouryy commented 2 years ago

Yes, I have two working examples with SB3: with the "parking-v0" env:

import gym
import highway_env
from stable_baselines3 import PPO
env = gym.make("parking-v0")
model = PPO("MultiInputPolicy", env, verbose=1)
model.learn(total_timesteps=20000)
model.save("ppo_parking")
del model
mode = PPO.load("ppo_parking")

and another one with the env "FetchReach-v1":

import gym
from stable_baselines3 import PPO
env = gym.make("FetchReach-v1")
model = PPO("MultiInputPolicy", env, verbose=1)
model.learn(total_timesteps=20000)
model.save("ppo_fetch")
model.load("ppo_fetch")
del model
model = PPO.loading("ppo_fetch")

Both are dict obs, and both seem to train, save and load fine with sb3. However, running them with imitation causes some kind of error regarding the observations. In the example of FetchReach-v1, we are able to lead the model, however it later crashes when the method generate_trajectories is called:

  File "imitation/src/imitation/data/rollout.py", line 390, in generate_trajectories
    exp_obs = (n_steps + 1,) + venv.observation_space.shape
TypeError: can only concatenate tuple (not "NoneType") to tuple

AdamGleave commented 2 years ago

Your examples are passing in a single environment to PPO, not a vectorized environment as imitation expects. Your imitation example is incomplete (I do not see the code creating venv), so it's hard to know exactly what's going on, but I expect a vectorized vs non-vectorized mixup.

AdamGleave commented 2 years ago

Closing due to inactivity.

HumanCompatibleAI / imitation