DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.85k stars 1.68k forks source link

Why does the Logger only return the train/ metrics, and not eval/, time/, and rollout/? #1888

Closed liamquantrill closed 5 months ago

liamquantrill commented 5 months ago

❓ Question

I am trying to use the Logger along with a custom callback to access the metrics listed at the top of the linked Logger docs.

I am taking inspiration from here and am able to access the train/ metrics, but cannot access the eval/, time/, or rollout/ ones.

Here is the code I am using:

import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.callbacks import BaseCallback

class MetricLogger(BaseCallback):
    def __init__(self, verbose=0):
        super(MetricLogger, self).__init__(verbose)
        self.metrics = {}

    def _on_rollout_end(self) -> None:
        print(self.logger.name_to_value.items())
        self.metrics = {
            "total_timesteps": self.model.logger.name_to_value['time/total_timesteps'],
            "episodes": self.model.logger.name_to_value['time/episodes'],
            "train_ep_len_mean": self.model.logger.name_to_value['rollout/ep_len_mean'],
            "train_ep_rew_mean": self.logger.name_to_value['rollout/ep_rew_mean'],
            "train_loss": self.model.logger.name_to_value['train/loss'],
            "eval_mean_ep_length": self.model.logger.name_to_value['eval/mean_ep_length'],
            "eval_mean_reward": self.model.logger.name_to_value['eval/mean_reward'],
        }

        for metric_name, value in self.metrics.items():
            print(metric_name, value)

    def _on_step(self) -> bool:
        return True

env = gym.make("CartPole-v1", render_mode="rgb_array")
env = Monitor(env)

model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10_000, callback=[MetricLogger()])

When run, print(self.logger.name_to_value.items()) prints:

dict_items([('train/learning_rate', 0.0003), ('train/entropy_loss', -0.6860671075060963), ('train/policy_gradient_loss', -0.017777339619351552), ('train/value_loss', 49.98345133364201), ('train/approx_kl', 0.008931432), ('train/clip_fraction', 0.10546875), ('train/loss', 6.665966510772705), ('train/explained_variance', -0.01795327663421631), ('train/n_updates', 10), ('train/clip_range', 0.2)])

And for metric_name, value in self.metrics.items(): print(metric_name, value) prints:

total_timesteps 0.0
episodes 0.0
train_ep_len_mean 0.0
train_ep_rew_mean 0.0
train_loss 6.665966510772705
eval_mean_ep_length 0.0
eval_mean_reward 0.0

As you can see, this only includes the train/ metrics. Please could somebody explain why this is so?

Checklist

araffin commented 5 months ago

Hello, what is your usecase? if you want to access all metrics, you might need to define a custom logger class. You don't see the rest of the metrics because they are flushed from the buffer just after being defined: https://github.com/DLR-RM/stable-baselines3/blob/5623d98f9d6bcfd2ab450e850c3f7b090aef5642/stable_baselines3/common/on_policy_algorithm.py#L274