assume-framework / assume

ASSUME - Agent-based Simulation for Studying and Understanding Market Evolution
https://assume.readthedocs.io
22 stars 6 forks source link

Early Stopping, Learning rate and noise decay #398

Open kim-mskw opened 1 month ago

kim-mskw commented 1 month ago

Implement the best practices from multi-agent Rl community and stablebaselines3 into our algorithm. Further analyse similarities between petting zoo multi-agent implementation to current RL implementation of Assume. (https://towardsdatascience.com/multi-agent-deep-reinforcement-learning-in-15-lines-of-code-using-pettingzoo-e0b963c0820b)

mthede commented 6 days ago

Quick Review in Code Bases and Literature

Early Stopping

1. Stable-Baselines3

Three possible callbacks:

2. Standard Protocol for Cooperative MARL

Meta-analysis on evaluation methodologies of cooperative MARL with proposed recommendations for standardised performance evaluation protocol. Published at NeurIPS 2022. Summary of protocol here.

--> Fixed number of training timesteps and episodes.

3. BenchMARL (Meta Research)

A proposed framework to deal with the fragmented community standards and reproducibility issues highlighted by the analysis above. Also some competitive environments. Published in Journal of Machine Learning Research 2024.

BenchMARL is a Multi-Agent Reinforcement Learning (MARL) training library created to enable reproducibility and benchmarking across different MARL algorithms and environments. Its mission is to present a standardized interface that allows easy integration of new algorithms and environments to provide a fair comparison with existing solutions.

--> Not implemented, callbacks may be customized.

4. PettingZoo

--> Not implemented.

Summary

For comparability and according to community standards no early stopping (as default) early_stopping_steps = training_episodes / validation_episodes_interval + 1 early_stopping_threshold = 0

Our current implementation of early stopping is closest to the "no model improvement" callback from SB3. However, suitable default values for steps and threshold are unclear. Should be chosen rather conservatively due to instability of environment and training, but can be useful for experimentation or when dealing with time/computational restrictions. So we'd like to keep the options.

Future development: could be restructured and generalized as callbacks.

mthede commented 6 days ago

Action Noise (Decay)

1. Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and Performance

Published by Jakob Hollenstein et al. (2022)

grafik ▶️ Recommendation (among others regarding noise type, noise scale and impact factor):

In general ▷ use a scheduler [...] Finally we recommend a scheduled reduction of the action noise impact factor β of over the training progress to improve robustness to the action noise configuration.

But type of scheduler didn't seem to be relevent. Performance of linear and logistic scheduler were similar.

2.1 Parameter Space Noise for Exploration

Published by Matthias Plappert et al. (2018)

2.2 OpenAI Baselines

Implementation of Parameter Space Noise for DDPG. Image from OpenAI blog article. grafik ▶️ Quite some overhead and OpenAI Baselines repo is no longer developed.

3. NoisyNet: Noisy Networks for Exploration

Published by Fortunato et al. (2019), also followed by a US Patent (2019, 2024) from DeepMind.

▶️ Would need further manual implementation. Seems to be used in other (more recent) publications.

4. PettingZoo

Action noise decay in tutorials with MARL libraries.

4.1 AgileRL

4.2.1 Ray (PettingZoo x DQN):

4.2.2 Ray (in total 17 possible choices of exploration with 5 different general schedulers):

Summary

NoisyNet or Parameter Space Noise interesting, but unclear if implementation effort would be justified. Generally, scheduling of decaying action noise should be implemented to improve performance.

▶️ Simple (e.g linear) but effective scheduling of action noise scale for now preferred.

If early stopping is enabled: due to scheduling according to fixed number of timesteps/episodes, a warning message needs to be generated that results may be improved because noise decay was not fully performed.

mthede commented 4 days ago

Learning Rate Decay

Sidenote: Currently Adam optimizer is used - Adam adapts/decays the individual learning rates of parameters automatically. However, an additional scheduling of the learning rate may still improve performance as discussed here.

1.1 Stable-Baselines3

1.2. SB3 Zoo (Training Framework)

2. Ray RL Library Scheduler

3. PyTorch LR Scheduler

4. Learning to Learn Learning-Rate Schedules

Series of conference papers on learning learning rates. Latest publication on a GreedyLR scheduler that ...

... outperforms several state-of-the-art schedulers in terms of accuracy, speed, and convergence. Furthermore, our method is easy to implement, computationally efficient, and requires minimal hyperparameter tuning.

This seems promising but unfortunately no code is provided. It is based on PyTorch's ReduceLROnPlateau scheduler.

Summary

Learning rate scheduling can be beneficial - also for Adam optimizer - and reduce runtimes. Current developments show great potential, but implementation would need to derived from publication. PyTorch offers many schedulers out of the box, but only specifically for the learning rate.

The SB3 implemetation provides a general scheduler which can be used for learning rate and action noise decay. ▶️ As we'd like to implement it anyway for the action noise decay, it can be used for the learning rate as well. It can be discussed in the future, if it should be switched to PyTorch's internal LR Scheduler.