DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.97k stars 1.68k forks source link

[Feature Request] AlphaZero development #1464

Closed fede72bari closed 1 year ago

fede72bari commented 1 year ago

🚀 Feature

Include AlphaZero in the library of available RL algorithm possibly with maskable actions option.

Motivation

I am a beginner in Reinforcement Learning, but I get some interesting results and I would like as many to test different solutions on some environments and RL contexts, I think that AlphaZero RL algorithms could boost performances in many contexts in terms of learning level and stability. I would like to propose its implementation as part of Stable Baseline3.

Pitch

Include AlphaZero as one of the inbuilt RL algorithms.

Alternatives

No response

Additional context

No response

Checklist

araffin commented 1 year ago

Hello, thanks for your interest. My answer will be similar to https://github.com/DLR-RM/stable-baselines3/issues/579 and https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/43#issuecomment-1278720787: I consider that algorithm to be out of scope for SB3, which focuses on the model-free, single agent setting. However, if you implement AlphaZero with SB3, I would be happy to link it in the documentation.

verbose-void commented 1 year ago

@araffin i'm curious why you made this decision? the main justification is that you want to constrain the scope of sb3?

araffin commented 1 year ago

yes, to keep the core as simple as possible so we can maintain it properly (see sb3 blog post and paper).