DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
9.22k stars 1.71k forks source link

[Bug]: Testing trained Model on Existing Gymnasium Environment #1516

Closed kameel2311 closed 1 year ago

kameel2311 commented 1 year ago

šŸ› Bug

Using common Stablebaselines3 2.0.0a9 code on the Lunar Environment, a DQN was trained where Gymnasium was imported and not gym. When loading the model is loaded again to visualize the performance , the error depiced below is shown.

The solution was to reload the environment via: env = model.get_env() function call and visualization worked. Why is that the case ? Moreover the method returns the done state and not the terminated or truncated boolean states, which yields more inconsistency in the code.

To Reproduce

import gymnasium as gym
from stable_baselines3 import DQN
import os

models_dir = "models/DQN"
logdir = "logs"

if not os.path.exists(models_dir):
    os.makedirs(models_dir)
if not os.path.exists(logdir):
    os.makedirs(logdir)

env = gym.make("LunarLander-v2",render_mode="human")
# env = gym.make("LunarLander-v2")
env.reset()

# model = DQN("MlpPolicy",env,verbose=1, tensorboard_log=logdir)
# TIMESTEPS = 10000
# for i in range(30):
#     model.learn(total_timesteps=int(TIMESTEPS), reset_num_timesteps=False, tb_log_name="DQN")
#     model.save(f"{models_dir}/{TIMESTEPS*i}")

models_path = os.path.join(models_dir,"290000.zip")
model = DQN.load(models_path,env=env)
# env = model.get_env()

episodes = 10
for ep in range(episodes):
    obs = env.reset()
    done = False    
    while not done:
        action, _states = model.predict(obs)
        obs, reward, done, info = env.step(action)
        env.render("human")
env.close()

Relevant log output / Error message

Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Traceback (most recent call last):
  File "/home/kameel/Repos/MRown/experimental.py", line 32, in <module>
    action, _states = model.predict(obs)
  File "/home/kameel/anaconda3/envs/mcs/lib/python3.9/site-packages/stable_baselines3/dqn/dqn.py", line 255, in predict
    action, state = self.policy.predict(observation, state, episode_start, deterministic)
  File "/home/kameel/anaconda3/envs/mcs/lib/python3.9/site-packages/stable_baselines3/common/policies.py", line 346, in predict
    observation, vectorized_env = self.obs_to_tensor(observation)
  File "/home/kameel/anaconda3/envs/mcs/lib/python3.9/site-packages/stable_baselines3/common/policies.py", line 260, in obs_to_tensor
    observation = np.array(observation)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

System Info

Checklist

araffin commented 1 year ago

I have read the documentation

https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecenv-api-vs-gym-api

Rajmehta123 commented 1 year ago

Any resolution?