Add support for Weights & Biases

Introduction and description

This PR adds the capabilities to use Weights & Biases for easier experiment tracking. The settings for wandb are controlled from the agents dictionary. This way, one can manipulate the settings on a per agent bases rather than for the full experiment. The advantages are:

Only upload certain agents to Weights & Biases
Group runs depending on the agent

Improvements in this PR

This PR includes the following improvements:

Add support for Weights & Biases
Small section about Weights & Biases in the docs
Replace mutable default arguments. Those can lead to nasty bugs and are generally hard to spot
Improved type hints: replacing Union[…, None] = None with Optional[...] = None
Remove empty spaces at end of line (quite a few, thats why the PR has to many changes)

Proof of Work

Future improvements

Future improvements might include global settings for Weights and Biases. I wasn't quite sure how, or if, they would fit into skrl's design philosophy, that's why I didn't implement this feature yet.

Final notes

Again, this PR got quite big, but mostly due to all the empty spaces at the end of lines being deleted (see discussion at #24). Maybe it is time to introduce a pre-commit pipeline. They can heavily improve the workflow, as they can get rid of basic problems. Some simple pre-commits would be to remove empty strings at the end of a line. Other more advanced one would auto-format the code (black is most common from what I saw on other projects). And from my personal experience, they really do help, because you do not need to think about all this stuff anymore. I you, @Toni-SM, are interested, let me know, I will gladly add a PR with some very basic pipelines and we can expand it from here on.

Cheers,

Johann

Hi @juhannc

Great pull request!

Regarding the WandB, there are the following points:

The WandB initialization must occur within the agent base class (inside the init method like the Tensorboard's SummaryWriter), thus relieving the trainers of that responsibility. Then it is necessary to revert the change of the agent/base.py file (init method) and move the BandW initialization there.
The documentation about BandW should appear in the Saving, loading and logging docs (docs/source/intro/data.rst) instead of docs/source/modules/skrl.utils.utilities.rst

About pre-commit, It could be. However, the Black code style is not quite right, especially with the functions and long lines, which are manually split to make the implementation (not the code) easier to read. I am going to read more about it...

Hi @Toni-SM,

thank you :)

Regarding your points:

I thought about that, but then the agent has to access to the trainers config and I think it would be important to include that one as well for reproducibility. The way I solved it, W&B will be initialized before the TensorBoard SummaryWriter is called. That's why I had to move the code about the folder name and creation for TensorBoard. If you disagree or know a better way to include the trainers config, let me know and I will change it.
Yeah, makes sense. To be honest, I wasn't sure where to put the docs and hoped you will correct me anyways 😅 I will change it once I'm back at home after the weekend!

Regarding pre-commits, I could implement a very basic version with just the line endings and we could go from there :)

Let me know what you think!

Hi @juhannc

What about passing the trainer configuration to the agent's init method? This way it is possible to initialize WandB from there and to do it before Tensorboard

About pre-commits, yes, we can start with removing empty spaces at the end of lines and ensure the file's line-ending best practices...

By the way, since I see that you have taken the lead in improving the type hints, as a comment: I had started doing it gradually and adding method call examples (starting with the skrl/models) 😅

Hi @Toni-SM

we could do that, yes.

But while we are reorganizing all the configs and with your recent improvements for the model generation, what do you think of the following. Instead of having two different config files and no config for the models, let's combine them into one config/dict. That way, all the model parameter that are currently function arguments, such as the layer sizes, activation functions, or stuff like clipped action, could all become part of a single configuration file.

That would maybe make the underlying code a little bit more complicated, e.g. model creation from dict keys (deterministic, gaussian, etc.), but it could make the usage and configuration easier in the long run. Also then the config file is basically the only thing you need to get everything running. And it would be easy to reproduce an experiment.

I feel like this idea is a little out of scope for this PR but I wanted ro throw the idea in the ring while we are at it.

What do you think?

Cheers!

Hi @juhannc

The idea of combining everything in the same configuration, as in other libraries, goes against the library's design. Although in principle it would facilitate its use, such a practice negates the modularity, the independence of the components, and above all the total control by the user to create the models.

The design of skrl is focused on readability, transparency, and capability of modification and interchangeability of its components... I prefer to sacrifice ease of use and configuration before the others

However, to facilitate its configuration from the command line/yaml, a function could be implemented to parse the information from there (with hydra or argparse) and return it. This way it could be integrated with the code and save all the configuration at once.

For example, something like:

from skrl.utils import get_configuration_from_command_line

config = get_configuration_from_command_line(parser="hydra", config_path="./config/some_file.yaml")

# model config
config["model"]

# agent config
config["agent"]

# trainer config
config["trainer"]

The idea of combining everything in the same configuration, as in other libraries, goes against the library's design. Although in principle it would facilitate its use, such a practice negates the modularity, the independence of the components, and above all the total control by the user to create the models.

That's why I wanted to discuss the idea instead of just creating a PR for it :)

However, to facilitate its configuration from the command line/yaml, a function could be implemented to parse the information from there (with hydra or argparse) and return it. This way it could be integrated with the code and save all the configuration at once.

For example, something like:
from skrl.utils import get_configuration_from_command_line

config = get_configuration_from_command_line(parser="hydra", config_path="./config/some_file.yaml")

# model config
config["model"]

# agent config
config["agent"]

# trainer config
config["trainer"]

I don't quite get your idea. This still looks like one big config dict to me, just with sub-dicts for every aspect of the training (models, agent, trainer). Also in your example, how would we use the default configs for the different agents? Would the get_configuration_from_command_line function read the desired agent and prepare the config["agent"] accordingly, also the model config for that matter? Where would it get the desired agent from? And what about multi-agent scenarios? Would the config["agent"] dict actually contain a list of dicts, one for each agent? I'm just not sure (yet) how it would fit into skrls current design. But I have to study hydra anyways this week, maybe I'll stuble across a solution.

Anyways, what to do with this PR? I would suggest doing the following and implementing a better configuration management in a new PR.

What about passing the trainer configuration to the agent's init method? This way it is possible to initialize WandB from there and to do it before Tensorboard

Cheers,

Johann

I mentioned the argument parser as an idea... we will have to work on it if we want to implement it in the future... or not...

Regarding this pull request, yes... the approach is to pass the trainer configuration to the agent's init method and initialize WandB there.

Btw, after approving the pre-commits PR, now this branch has conflicts that must be resolved ^

Sounds good! I will change that tomorrow.

Regarding the blocked merge, a simple rebase should fix that too :)

Hi @Toni-SM,

I changed the way WandB is called, now the agent handles WandB instead of the trainer. I think the PR should be ready now.

Here is some proof that it's working (at least on my machine haha):

❯ /Users/johann/.virtualenvs/skrl-dtxv/bin/python /Users/johann/tmp/skrl/gym_cartpole_dqn.py
/Users/johann/.virtualenvs/skrl-dtxv/lib/python3.9/site-packages/torch/utils/tensorboard/__init__.py:4: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if not hasattr(tensorboard, "__version__") or LooseVersion(
/Users/johann/.virtualenvs/skrl-dtxv/lib/python3.9/site-packages/gym/envs/registration.py:592: UserWarning: WARN: The environment CartPole-v0 is out of date. You should consider upgrading to version `v1`.
  logger.warn(
/Users/johann/.virtualenvs/skrl-dtxv/lib/python3.9/site-packages/gym/core.py:329: DeprecationWarning: WARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.
  deprecation(
/Users/johann/.virtualenvs/skrl-dtxv/lib/python3.9/site-packages/gym/wrappers/step_api_compatibility.py:39: DeprecationWarning: WARN: Initializing environment in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.
  deprecation(
[skrl:INFO] Environment class: gym.core.Wrapper
[skrl:INFO] Environment wrapper: Gym
wandb: Currently logged in as: juhannc. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.13.4
wandb: Run data is saved locally in /Users/johann/tmp/skrl/wandb/run-20221015_164154-d2xbxrvk
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run 22-10-15_16-41-53-122478_DQN
wandb: ⭐️ View project at https://wandb.ai/juhannc/skrl
wandb: 🚀 View run at https://wandb.ai/juhannc/skrl/runs/d2xbxrvk
/Users/johann/.virtualenvs/skrl-dtxv/lib/python3.9/site-packages/wandb/sdk/lib/import_hooks.py:246: DeprecationWarning: Deprecated since Python 3.4. Use importlib.util.find_spec() instead.
  loader = importlib.find_loader(fullname, path)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:12<00:00, 816.79it/s]
wandb: Waiting for W&B process to finish... (success).
wandb: 
wandb: Run history:
wandb:      Episode / Total timesteps (max) ▃▂▁▂▁▁▁▁▁▁▂▁▂▂▂▂▁▁▂▄▂▂▂▅▂▄▄▃█▆▆█▇▄▃█▇█▇▆
wandb:     Episode / Total timesteps (mean) ▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▄▁▂▂▅▂▄▄▃█▆▆█▇▄▂█▇█▇▆
wandb:      Episode / Total timesteps (min) ▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▂▂▅▂▄▃▃█▆▆█▇▄▂█▇█▇▆
wandb:    Exploration / Exploration epsilon █▇▆▅▅▄▄▃▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb:                Loss / Q-network loss █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▁▂▂▂
wandb:  Reward / Instantaneous reward (max) ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb: Reward / Instantaneous reward (mean) ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb:  Reward / Instantaneous reward (min) ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb:          Reward / Total reward (max) ▃▂▁▂▁▁▁▁▁▁▂▁▂▂▂▂▁▁▂▄▂▂▂▅▂▄▄▃█▆▆█▇▄▃█▇█▇▆
wandb:         Reward / Total reward (mean) ▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▄▁▂▂▅▂▄▄▃█▆▆█▇▄▂█▇█▇▆
wandb:          Reward / Total reward (min) ▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▂▂▅▂▄▃▃█▆▆█▇▄▂█▇█▇▆
wandb:                Target / Target (max) ▁▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███
wandb:               Target / Target (mean) ▁▁▁▁▂▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇███
wandb:                Target / Target (min) ▁███████████████████████████████████████
wandb:                          global_step ▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
wandb: 
wandb: Run summary:
wandb:      Episode / Total timesteps (max) 156.0
wandb:     Episode / Total timesteps (mean) 156.0
wandb:      Episode / Total timesteps (min) 156.0
wandb:    Exploration / Exploration epsilon 0.04126
wandb:                Loss / Q-network loss 0.03688
wandb:  Reward / Instantaneous reward (max) 1.0
wandb: Reward / Instantaneous reward (mean) 1.0
wandb:  Reward / Instantaneous reward (min) 1.0
wandb:          Reward / Total reward (max) 156.0
wandb:         Reward / Total reward (mean) 156.0
wandb:          Reward / Total reward (min) 156.0
wandb:                Target / Target (max) 5.09955
wandb:               Target / Target (mean) 4.55282
wandb:                Target / Target (min) 1.0
wandb:                          global_step 10000
wandb: 
wandb: Synced 22-10-15_16-41-53-122478_DQN: https://wandb.ai/juhannc/skrl/runs/d2xbxrvk
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 1 other file(s)
wandb: Find logs at: ./wandb/run-20221015_164154-d2xbxrvk/logs

The MWE I used to test it is here:

import gym

# Import the skrl components to build the RL system
from skrl.utils.model_instantiators import deterministic_model, Shape
from skrl.memories.torch import RandomMemory
from skrl.agents.torch.dqn import DQN, DQN_DEFAULT_CONFIG
from skrl.trainers.torch import SequentialTrainer
from skrl.envs.torch import wrap_env

# Load and wrap the Gym environment.
# Note: the environment version may change depending on the gym version
try:
    env = gym.make("CartPole-v0", new_step_api=False)
except gym.error.DeprecatedEnv as e:
    env_id = [spec.id for spec in gym.envs.registry.all() if spec.id.startswith("CartPole-v")][0]
    print("CartPole-v0 not found. Trying {}".format(env_id))
    env = gym.make(env_id)
env = wrap_env(env)

device = env.device

# Instantiate a RandomMemory (without replacement) as experience replay memory
memory = RandomMemory(memory_size=50000, num_envs=env.num_envs, device=device, replacement=False)

# Instantiate the agent's models (function approximators) using the model instantiator utility
# DQN requires 2 models, visit its documentation for more details
# https://skrl.readthedocs.io/en/latest/modules/skrl.agents.dqn.html#spaces-and-models
models_dqn = {}
models_dqn["q_network"] = deterministic_model(observation_space=env.observation_space,
                                              action_space=env.action_space,
                                              device=device,
                                              clip_actions=False,
                                              input_shape=Shape.OBSERVATIONS,
                                              hiddens=[64, 64],
                                              hidden_activation=["relu", "relu"],
                                              output_shape=Shape.ACTIONS,
                                              output_activation=None,
                                              output_scale=1.0)
models_dqn["target_q_network"] = deterministic_model(observation_space=env.observation_space,
                                                     action_space=env.action_space,
                                                     device=device,
                                                     clip_actions=False,
                                                     input_shape=Shape.OBSERVATIONS,
                                                     hiddens=[64, 64],
                                                     hidden_activation=["relu", "relu"],
                                                     output_shape=Shape.ACTIONS,
                                                     output_activation=None,
                                                     output_scale=1.0)

# Initialize the models' parameters (weights and biases) using a Gaussian distribution
for model in models_dqn.values():
    model.init_parameters(method_name="normal_", mean=0.0, std=0.1)

# Configure and instantiate the agent.
# Only modify some of the default configuration, visit its documentation to see all the options
# https://skrl.readthedocs.io/en/latest/modules/skrl.agents.dqn.html#configuration-and-hyperparameters
cfg_dqn = DQN_DEFAULT_CONFIG.copy()
cfg_dqn["learning_starts"] = 100
cfg_dqn["exploration"]["final_epsilon"] = 0.04
cfg_dqn["exploration"]["timesteps"] = 1500
# logging to TensorBoard and write checkpoints each 1000 and 5000 timesteps respectively
cfg_dqn["experiment"]["write_interval"] = 100
cfg_dqn["experiment"]["checkpoint_interval"] = 500
cfg_dqn["experiment"]["wandb"] = {
        "enabled": True,
        "project": "skrl",
        "entity": "juhannc",
    }

agent_dqn = DQN(models=models_dqn,
                memory=memory,
                cfg=cfg_dqn,
                observation_space=env.observation_space,
                action_space=env.action_space,
                device=device)

# Configure and instantiate the RL trainer
cfg_trainer = {"timesteps": 10000, "headless": True}
trainer = SequentialTrainer(cfg=cfg_trainer, env=env, agents=agent_dqn)

# start training
trainer.train()

Hi @juhannc

I have grouped the agents' wandb configuration parameters into the wandb_kwargs dictionary to reduce the size of the agent configuration dictionary and allow other arguments to be set.

Awesome! Looks good to me :)

Toni-SM / skrl