Open kim-mskw opened 3 months ago
Three possible callbacks:
Meta-analysis on evaluation methodologies of cooperative MARL with proposed recommendations for standardised performance evaluation protocol. Published at NeurIPS 2022. Summary of protocol here.
--> Fixed number of training timesteps and episodes.
A proposed framework to deal with the fragmented community standards and reproducibility issues highlighted by the analysis above. Also some competitive environments. Published in Journal of Machine Learning Research 2024.
BenchMARL is a Multi-Agent Reinforcement Learning (MARL) training library created to enable reproducibility and benchmarking across different MARL algorithms and environments. Its mission is to present a standardized interface that allows easy integration of new algorithms and environments to provide a fair comparison with existing solutions.
--> Not implemented, callbacks may be customized.
--> Not implemented.
For comparability and according to community standards no early stopping (as default)
early_stopping_steps = training_episodes / validation_episodes_interval + 1
early_stopping_threshold = 0
Our current implementation of early stopping is closest to the "no model improvement" callback from SB3. However, suitable default values for steps and threshold are unclear. Should be chosen rather conservatively due to instability of environment and training, but can be useful for experimentation or when dealing with time/computational restrictions. So we'd like to keep the options.
Future development: could be restructured and generalized as callbacks.
Published by Jakob Hollenstein et al. (2022)
▶️ Recommendation (among others regarding noise type, noise scale and impact factor):
In general ▷ use a scheduler [...] Finally we recommend a scheduled reduction of the action noise impact factor β of over the training progress to improve robustness to the action noise configuration.
But type of scheduler didn't seem to be relevent. Performance of linear and logistic scheduler were similar.
Published by Matthias Plappert et al. (2018)
Implementation of Parameter Space Noise for DDPG. Image from OpenAI blog article. ▶️ Quite some overhead and OpenAI Baselines repo is no longer developed.
Published by Fortunato et al. (2019), also followed by a US Patent (2019, 2024) from DeepMind.
▶️ Would need further manual implementation. Seems to be used in other (more recent) publications.
Action noise decay in tutorials with MARL libraries.
NoisyNet or Parameter Space Noise interesting, but unclear if implementation effort would be justified. Generally, scheduling of decaying action noise should be implemented to improve performance.
▶️ Simple (e.g linear) but effective scheduling of action noise scale for now preferred.
If early stopping is enabled: due to scheduling according to fixed number of timesteps/episodes, a warning message needs to be generated that results may be improved because noise decay was not fully performed.
Sidenote: Currently Adam optimizer is used - Adam adapts/decays the individual learning rates of parameters automatically. However, an additional scheduling of the learning rate may still improve performance as discussed here.
Series of conference papers on learning learning rates. Latest publication on a GreedyLR scheduler that ...
... outperforms several state-of-the-art schedulers in terms of accuracy, speed, and convergence. Furthermore, our method is easy to implement, computationally efficient, and requires minimal hyperparameter tuning.
This seems promising but unfortunately no code is provided. It is based on PyTorch's ReduceLROnPlateau scheduler.
Learning rate scheduling can be beneficial - also for Adam optimizer - and reduce runtimes. Current developments show great potential, but implementation would need to derived from publication. PyTorch offers many schedulers out of the box, but only specifically for the learning rate.
The SB3 implemetation provides a general scheduler which can be used for learning rate and action noise decay. ▶️ As we'd like to implement it anyway for the action noise decay, it can be used for the learning rate as well. It can be discussed in the future, if it should be switched to PyTorch's internal LR Scheduler.
Implement the best practices from multi-agent Rl community and stablebaselines3 into our algorithm. Further analyse similarities between petting zoo multi-agent implementation to current RL implementation of Assume. (https://towardsdatascience.com/multi-agent-deep-reinforcement-learning-in-15-lines-of-code-using-pettingzoo-e0b963c0820b)