StopTrainingOnMaxEpisodes Assertion Error for 'dones' in locals

❓ Question

Hi,

I'm trying to run PPO but whenever I try to use StopTrainingOnMaxEpisodes, it gives me the assertion error: AssertionError: dones variable is not defined, please check your code next to callback.on_step()

The relevant code is:

    def _on_step(self) -> bool:
        # Check that the `dones` local variable is defined
        assert "dones" in self.locals, "`dones` variable is not defined, please check your code next to `callback.on_step()`"
        self.n_episodes += np.sum(self.locals["dones"]).item()

        continue_training = self.n_episodes < self._total_max_episodes

        if self.verbose >= 1 and not continue_training:
            mean_episodes_per_env = self.n_episodes / self.training_env.num_envs
            mean_ep_str = (
                f"with an average of {mean_episodes_per_env:.2f} episodes per env" if self.training_env.num_envs > 1 else ""
            )

            print(
                f"Stopping training with a total of {self.num_timesteps} steps because the "
                f"{self.locals.get('tb_log_name')} model reached max_episodes={self.max_episodes}, "
                f"by playing for {self.n_episodes} episodes "
                f"{mean_ep_str}"
            )
        return continue_training

I'm not too what this means or how I am supposed to define 'dones' though? Is it in the on_step() function for the BaseCallback() step?

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

DLR-RM / stable-baselines3

StopTrainingOnMaxEpisodes Assertion Error for 'dones' in locals #1952

❓ Question

Checklist