DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
9.24k stars 1.71k forks source link

[Question] How to access to rollout (logger) data in callback #1919

Closed JaimeParker closed 6 months ago

JaimeParker commented 7 months ago

❓ Question

I'm using a custom gym env with multi envs, and I want to write a customized callback function related to StopTrainingOnRewardThreshold, one difference is I shall use "rollout/ep_rew_mean" as standard, but I don't know to to get this value in callback.py.

Here is my customized class in callback.py

class StopTrainingOnBestRewardNoImprovement(BaseCallback):

    def __init__(self, reward_threshold: float, verbose: int = 0):
        super().__init__(verbose)

        self.reward_threshold = reward_threshold
        self.best_mean_reward = -np.inf

    def _on_step(self) -> bool:
        """
        This method will be called by the model after each call to `env.step()`.

        For child callback (of an `EventCallback`), this will be called
        when the event is triggered.

        :return: If the callback returns False, training is aborted early.
        """
        print(self.model.logger.name_to_value["rollout/ep_rew_mean"])
        print(self.logger.name_to_value["rollout/ep_rew_mean"])
        return True

I use print to check if the data ere accessible, here is output:

0.0
0.0
0.0
0.0
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 16.4     |
|    ep_rew_mean     | -348     |
| time/              |          |
|    episodes        | 40       |
|    fps             | 3442     |
|    time_elapsed    | 0        |
|    total_timesteps | 1350     |
| train/             |          |
|    actor_loss      | 1.95     |
|    critic_loss     | 2.04e+03 |
|    ent_coef        | 0.993    |
|    ent_coef_loss   | -0.0464  |
|    learning_rate   | 0.0003   |
|    n_updates       | 24       |
---------------------------------
0.0
0.0
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 17.1     |
|    ep_rew_mean     | -352     |
| time/              |          |
|    episodes        | 44       |
|    fps             | 3489     |
|    time_elapsed    | 0        |
|    total_timesteps | 1400     |
| train/             |          |
|    actor_loss      | 2.26     |
|    critic_loss     | 1.19e+03 |
|    ent_coef        | 0.993    |
|    ent_coef_loss   | -0.0485  |
|    learning_rate   | 0.0003   |
|    n_updates       | 25       |
---------------------------------
0.0
0.0
0.0
0.0
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 17.7     |
|    ep_rew_mean     | -355     |
| time/              |          |
|    episodes        | 48       |
|    fps             | 3570     |
|    time_elapsed    | 0        |
|    total_timesteps | 1500     |
| train/             |          |
|    actor_loss      | 2.85     |
|    critic_loss     | 1.43e+03 |
|    ent_coef        | 0.992    |
|    ent_coef_loss   | -0.0524  |
|    learning_rate   | 0.0003   |
|    n_updates       | 27       |
---------------------------------

All zeros, seems that this way is wrong, but how to get this value?

Checklist

araffin commented 7 months ago

Duplicate of https://github.com/DLR-RM/stable-baselines3/issues/1888#issuecomment-2047612941

You will need to access those values via self.model.

JaimeParker commented 6 months ago

@araffin since rollout info were dumped (in off_policy_algorithm.py), there is no chance for accessing rollout data in callback logger without a customized logger class?