DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.84k stars 1.68k forks source link

StopTrainingOnMaxEpisodes Assertion Error for 'dones' in locals #1952

Closed KevinHan1209 closed 2 months ago

KevinHan1209 commented 3 months ago

❓ Question

Hi,

I'm trying to run PPO but whenever I try to use StopTrainingOnMaxEpisodes, it gives me the assertion error: AssertionError: dones variable is not defined, please check your code next to callback.on_step()

The relevant code is:

    def _on_step(self) -> bool:
        # Check that the `dones` local variable is defined
        assert "dones" in self.locals, "`dones` variable is not defined, please check your code next to `callback.on_step()`"
        self.n_episodes += np.sum(self.locals["dones"]).item()

        continue_training = self.n_episodes < self._total_max_episodes

        if self.verbose >= 1 and not continue_training:
            mean_episodes_per_env = self.n_episodes / self.training_env.num_envs
            mean_ep_str = (
                f"with an average of {mean_episodes_per_env:.2f} episodes per env" if self.training_env.num_envs > 1 else ""
            )

            print(
                f"Stopping training with a total of {self.num_timesteps} steps because the "
                f"{self.locals.get('tb_log_name')} model reached max_episodes={self.max_episodes}, "
                f"by playing for {self.n_episodes} episodes "
                f"{mean_ep_str}"
            )
        return continue_training

I'm not too what this means or how I am supposed to define 'dones' though? Is it in the on_step() function for the BaseCallback() step?

Checklist

araffin commented 2 months ago

If code there is, it is minimal and working

Closing because the minimum requirements for seeking help are not met.