Farama-Foundation / SuperSuit

A collection of wrappers for Gymnasium and PettingZoo environments (being merged into gymnasium.wrappers and pettingzoo.wrappers
Other
441 stars 56 forks source link

render method: signature mismatch #223

Closed adam-crowther closed 9 months ago

adam-crowther commented 1 year ago

Hi,

First some observations:

TypeError: ConcatVecEnv.render() got an unexpected keyword argument 'mode'

Looking at commit history it seems that the mode parameter did actually exist in the ConcatVecEnv wrapper up till release 3.6.0: https://github.com/Farama-Foundation/SuperSuit/blob/9b661380036e53ed338b257406dd2c32ed67d5f5/supersuit/vector/concat_vec_env.py#L112C1-L114C1 But it was removed in release 3.7.0: https://github.com/Farama-Foundation/SuperSuit/blob/ac2a2511bfa2bd268965eb6d7b54e3b9d5a3b48b/supersuit/vector/concat_vec_env.py#L112-L114C1 This change broke the env.render() method when it is wrapped with ss.concat_vec_envs_v1(env, 1, base_class='stable_baselines3').

I need the render method because I'm using the Baselines3 EvalCallback, and I want to generate PNGs of my env for each evaluation episode. (This works in pure Baselines3).

I can't shim this, because the SB3SecEnvWrapper sets up the ConcatVecEnv wrapper internally. For now I will have to clone your repo and hack the mode parameter back in to the ConcatVecEnv wrapper.

Thanks for your help,

Adam

adam-crowther commented 1 year ago

I have updated my dummy project from https://github.com/Farama-Foundation/SuperSuit/issues/222 to demonstrate the issue: https://github.com/adam-crowther/test-supersuit-baseline3-pettingzoo-parallel-env

    eval_callback = EvalCallback(env_parallel,
                                 best_model_save_path="./logs/",
                                 log_path="./logs/",
                                 eval_freq=50,
                                 deterministic=True,
                                 render=True,
                                 verbose=1)
    model.learn(total_timesteps=10_000, callback=eval_callback)

The callback invokes the render() method on evaluation, which causes the following error:

C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\common\evaluation.py:67: UserWarning: Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.
  warnings.warn(
Traceback (most recent call last):
  File "C:\dev\repo\test-supersuit-baseline3-pettingzoo-parallel-env\main_dummy.py", line 32, in <module>
    model.learn(total_timesteps=5_000, callback=eval_callback)
  File "C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\ppo\ppo.py", line 308, in learn
    return super().learn(
           ^^^^^^^^^^^^^^
  File "C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 259, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 184, in collect_rollouts
    if callback.on_step() is False:
       ^^^^^^^^^^^^^^^^^^
  File "C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\common\callbacks.py", line 104, in on_step
    return self._on_step()
           ^^^^^^^^^^^^^^^
  File "C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\common\callbacks.py", line 449, in _on_step
    episode_rewards, episode_lengths = evaluate_policy(
                                       ^^^^^^^^^^^^^^^^
  File "C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\common\evaluation.py", line 131, in evaluate_policy
    env.render()
  File "C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\common\vec_env\base_vec_env.py", line 361, in render
    return self.venv.render(mode=mode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\adamcc\AppData\Roaming\Python\Python311\site-packages\stable_baselines3\common\vec_env\base_vec_env.py", line 361, in render
    return self.venv.render(mode=mode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: ConcatVecEnv.render() got an unexpected keyword argument 'mode'

Process finished with exit code 1
adam-crowther commented 1 year ago

I managed to hack it locally by cloning the SuperSuit repo and changing the signature of the render() methods, adding back the mode parameter where neccessary:

class ConcatVecEnv:
...
    def render(self, mode: Optional[str] = None) -> Optional[np.ndarray]:
        return self.vec_envs[0].render(mode)

class ProcConcatVec:
...
    def render(self, mode: Optional[str] = None) -> Optional[np.ndarray]:
        self.pipes[0].send("render")
        render_result = self.pipes[0].recv()

        if isinstance(render_result, tuple):
            e, tb = render_result
            print(tb)
            raise e

        return render_result

class SingleVecEnv:
...
    def render(self, mode: Optional[str] = None) -> Optional[np.ndarray]:
        return self.gym_env.render(mode)

class MarkovVectorEnv:
...
    def render(self, mode: Optional[str] = None) -> Optional[np.ndarray]:
        return self.par_env.render()

Note that that last one does NOT pass the mode parameter to the wrapped env, but just drops it. I know this is a fugly hack, and I'm not saying this is how the issue should be fixed, My intention here is just to give you as much information as possible.

elliottower commented 1 year ago

Hey Adam, thanks a lot for bringing this up, I will go through and try to make things as consistent as possible. Our SB3 wrapper should most definitely have the same signature as its superclass.

elliottower commented 9 months ago

I believe this was completed with the above PR but if there are still any inconsistencies/issues please let me know