Using common Stablebaselines3 2.0.0a9 code on the Lunar Environment, a DQN was trained where Gymnasium was imported and not gym. When loading the model is loaded again to visualize the performance , the error depiced below is shown.
The solution was to reload the environment via: env = model.get_env() function call and visualization worked. Why is that the case ? Moreover the method returns the done state and not the terminated or truncated boolean states, which yields more inconsistency in the code.
To Reproduce
import gymnasium as gym
from stable_baselines3 import DQN
import os
models_dir = "models/DQN"
logdir = "logs"
if not os.path.exists(models_dir):
os.makedirs(models_dir)
if not os.path.exists(logdir):
os.makedirs(logdir)
env = gym.make("LunarLander-v2",render_mode="human")
# env = gym.make("LunarLander-v2")
env.reset()
# model = DQN("MlpPolicy",env,verbose=1, tensorboard_log=logdir)
# TIMESTEPS = 10000
# for i in range(30):
# model.learn(total_timesteps=int(TIMESTEPS), reset_num_timesteps=False, tb_log_name="DQN")
# model.save(f"{models_dir}/{TIMESTEPS*i}")
models_path = os.path.join(models_dir,"290000.zip")
model = DQN.load(models_path,env=env)
# env = model.get_env()
episodes = 10
for ep in range(episodes):
obs = env.reset()
done = False
while not done:
action, _states = model.predict(obs)
obs, reward, done, info = env.step(action)
env.render("human")
env.close()
Relevant log output / Error message
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Traceback (most recent call last):
File "/home/kameel/Repos/MRown/experimental.py", line 32, in <module>
action, _states = model.predict(obs)
File "/home/kameel/anaconda3/envs/mcs/lib/python3.9/site-packages/stable_baselines3/dqn/dqn.py", line 255, in predict
action, state = self.policy.predict(observation, state, episode_start, deterministic)
File "/home/kameel/anaconda3/envs/mcs/lib/python3.9/site-packages/stable_baselines3/common/policies.py", line 346, in predict
observation, vectorized_env = self.obs_to_tensor(observation)
File "/home/kameel/anaconda3/envs/mcs/lib/python3.9/site-packages/stable_baselines3/common/policies.py", line 260, in obs_to_tensor
observation = np.array(observation)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
š Bug
Using common Stablebaselines3 2.0.0a9 code on the Lunar Environment, a DQN was trained where Gymnasium was imported and not gym. When loading the model is loaded again to visualize the performance , the error depiced below is shown.
The solution was to reload the environment via: env = model.get_env() function call and visualization worked. Why is that the case ? Moreover the method returns the done state and not the terminated or truncated boolean states, which yields more inconsistency in the code.
To Reproduce
Relevant log output / Error message
System Info
Checklist