Closed adam-crowther closed 9 months ago
I have updated my dummy project from https://github.com/Farama-Foundation/SuperSuit/issues/222 to demonstrate the issue: https://github.com/adam-crowther/test-supersuit-baseline3-pettingzoo-parallel-env
eval_callback = EvalCallback(env_parallel,
best_model_save_path="./logs/",
log_path="./logs/",
eval_freq=50,
deterministic=True,
render=True,
verbose=1)
model.learn(total_timesteps=10_000, callback=eval_callback)
The callback invokes the render() method on evaluation, which causes the following error:
C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\common\evaluation.py:67: UserWarning: Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.
warnings.warn(
Traceback (most recent call last):
File "C:\dev\repo\test-supersuit-baseline3-pettingzoo-parallel-env\main_dummy.py", line 32, in <module>
model.learn(total_timesteps=5_000, callback=eval_callback)
File "C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\ppo\ppo.py", line 308, in learn
return super().learn(
^^^^^^^^^^^^^^
File "C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 259, in learn
continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 184, in collect_rollouts
if callback.on_step() is False:
^^^^^^^^^^^^^^^^^^
File "C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\common\callbacks.py", line 104, in on_step
return self._on_step()
^^^^^^^^^^^^^^^
File "C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\common\callbacks.py", line 449, in _on_step
episode_rewards, episode_lengths = evaluate_policy(
^^^^^^^^^^^^^^^^
File "C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\common\evaluation.py", line 131, in evaluate_policy
env.render()
File "C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\common\vec_env\base_vec_env.py", line 361, in render
return self.venv.render(mode=mode)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\common\vec_env\base_vec_env.py", line 361, in render
return self.venv.render(mode=mode)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: ConcatVecEnv.render() got an unexpected keyword argument 'mode'
Process finished with exit code 1
I managed to hack it locally by cloning the SuperSuit repo and changing the signature of the render()
methods, adding back the mode
parameter where neccessary:
class ConcatVecEnv:
...
def render(self, mode: Optional[str] = None) -> Optional[np.ndarray]:
return self.vec_envs[0].render(mode)
class ProcConcatVec:
...
def render(self, mode: Optional[str] = None) -> Optional[np.ndarray]:
self.pipes[0].send("render")
render_result = self.pipes[0].recv()
if isinstance(render_result, tuple):
e, tb = render_result
print(tb)
raise e
return render_result
class SingleVecEnv:
...
def render(self, mode: Optional[str] = None) -> Optional[np.ndarray]:
return self.gym_env.render(mode)
class MarkovVectorEnv:
...
def render(self, mode: Optional[str] = None) -> Optional[np.ndarray]:
return self.par_env.render()
Note that that last one does NOT pass the mode parameter to the wrapped env, but just drops it. I know this is a fugly hack, and I'm not saying this is how the issue should be fixed, My intention here is just to give you as much information as possible.
Hey Adam, thanks a lot for bringing this up, I will go through and try to make things as consistent as possible. Our SB3 wrapper should most definitely have the same signature as its superclass.
I believe this was completed with the above PR but if there are still any inconsistencies/issues please let me know
Hi,
First some observations:
SB3SecEnvWrapper
inherits from Baselines3 parent classVecEnvWrapper
.VecEnvWrapper
provides the following render() method:SB3SecEnvWrapper
wraps an instance ofConcatVecEnv
, which provides the following render() method:env = ss.concat_vec_envs_v1(env, 1, base_class='stable_baselines3')
wraps my env in aSB3SecEnvWrapper
, which in turn wraps aConcatVecEnv
. When I callenv.render()
on my environment, I get the following error:Looking at commit history it seems that the
mode
parameter did actually exist in theConcatVecEnv
wrapper up till release 3.6.0: https://github.com/Farama-Foundation/SuperSuit/blob/9b661380036e53ed338b257406dd2c32ed67d5f5/supersuit/vector/concat_vec_env.py#L112C1-L114C1 But it was removed in release 3.7.0: https://github.com/Farama-Foundation/SuperSuit/blob/ac2a2511bfa2bd268965eb6d7b54e3b9d5a3b48b/supersuit/vector/concat_vec_env.py#L112-L114C1 This change broke theenv.render()
method when it is wrapped withss.concat_vec_envs_v1(env, 1, base_class='stable_baselines3')
.I need the render method because I'm using the Baselines3 EvalCallback, and I want to generate PNGs of my env for each evaluation episode. (This works in pure Baselines3).
I can't shim this, because the
SB3SecEnvWrapper
sets up theConcatVecEnv
wrapper internally. For now I will have to clone your repo and hack themode
parameter back in to theConcatVecEnv
wrapper.Thanks for your help,
Adam