[Feature Request] AlphaZero development

fede72bari commented 1 year ago

🚀 Feature

Include AlphaZero in the library of available RL algorithm possibly with maskable actions option.

Motivation

I am a beginner in Reinforcement Learning, but I get some interesting results and I would like as many to test different solutions on some environments and RL contexts, I think that AlphaZero RL algorithms could boost performances in many contexts in terms of learning level and stability. I would like to propose its implementation as part of Stable Baseline3.

Pitch

Include AlphaZero as one of the inbuilt RL algorithms.

Alternatives

No response

Additional context

No response

Checklist

[X] I have checked that there is no similar issue in the repo

araffin commented 1 year ago

Hello, thanks for your interest. My answer will be similar to https://github.com/DLR-RM/stable-baselines3/issues/579 and https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/43#issuecomment-1278720787: I consider that algorithm to be out of scope for SB3, which focuses on the model-free, single agent setting. However, if you implement AlphaZero with SB3, I would be happy to link it in the documentation.

verbose-void commented 1 year ago

@araffin i'm curious why you made this decision? the main justification is that you want to constrain the scope of sb3?

araffin commented 1 year ago

yes, to keep the core as simple as possible so we can maintain it properly (see sb3 blog post and paper).

DLR-RM / stable-baselines3