[Bug]: Testing trained Model on Existing Gymnasium Environment

kameel2311 commented 1 year ago

🐛 Bug

Using common Stablebaselines3 2.0.0a9 code on the Lunar Environment, a DQN was trained where Gymnasium was imported and not gym. When loading the model is loaded again to visualize the performance , the error depiced below is shown.

The solution was to reload the environment via: env = model.get_env() function call and visualization worked. Why is that the case ? Moreover the method returns the done state and not the terminated or truncated boolean states, which yields more inconsistency in the code.

To Reproduce

import gymnasium as gym
from stable_baselines3 import DQN
import os

models_dir = "models/DQN"
logdir = "logs"

if not os.path.exists(models_dir):
    os.makedirs(models_dir)
if not os.path.exists(logdir):
    os.makedirs(logdir)

env = gym.make("LunarLander-v2",render_mode="human")
# env = gym.make("LunarLander-v2")
env.reset()

# model = DQN("MlpPolicy",env,verbose=1, tensorboard_log=logdir)
# TIMESTEPS = 10000
# for i in range(30):
#     model.learn(total_timesteps=int(TIMESTEPS), reset_num_timesteps=False, tb_log_name="DQN")
#     model.save(f"{models_dir}/{TIMESTEPS*i}")

models_path = os.path.join(models_dir,"290000.zip")
model = DQN.load(models_path,env=env)
# env = model.get_env()

episodes = 10
for ep in range(episodes):
    obs = env.reset()
    done = False    
    while not done:
        action, _states = model.predict(obs)
        obs, reward, done, info = env.step(action)
        env.render("human")
env.close()

Relevant log output / Error message

Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Traceback (most recent call last):
  File "/home/kameel/Repos/MRown/experimental.py", line 32, in <module>
    action, _states = model.predict(obs)
  File "/home/kameel/anaconda3/envs/mcs/lib/python3.9/site-packages/stable_baselines3/dqn/dqn.py", line 255, in predict
    action, state = self.policy.predict(observation, state, episode_start, deterministic)
  File "/home/kameel/anaconda3/envs/mcs/lib/python3.9/site-packages/stable_baselines3/common/policies.py", line 346, in predict
    observation, vectorized_env = self.obs_to_tensor(observation)
  File "/home/kameel/anaconda3/envs/mcs/lib/python3.9/site-packages/stable_baselines3/common/policies.py", line 260, in obs_to_tensor
    observation = np.array(observation)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

System Info

OS: Linux-5.15.0-71-generic-x86_64-with-glibc2.31 # 78~20.04.1-Ubuntu SMP Wed Apr 19 11:26:48 UTC 2023
Python: 3.9.0
Stable-Baselines3: 2.0.0a9
PyTorch: 2.0.0+cu117
GPU Enabled: True
Numpy: 1.24.3
Cloudpickle: 2.2.1
Gymnasium: 0.28.1
OpenAI Gym: 0.21.0

Checklist

[ ] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] I have provided a minimal working example to reproduce the bug
[X] I've used the markdown code blocks for both code and stack traces.

araffin commented 1 year ago

I have read the documentation

https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecenv-api-vs-gym-api

Rajmehta123 commented 1 year ago

Any resolution?

DLR-RM / stable-baselines3