DHDev0 / Stochastic-muzero

Pytorch Implementation of Stochastic MuZero for gym environment. This algorithm is capable of supporting a wide range of action and observation spaces, including both discrete and continuous variations.
GNU General Public License v3.0
58 stars 10 forks source link
arxiv-papers deep-reinforcement-learning gym-environments lstm machine-learning monte-carlo-tree-search multilayer-perceptron muzero muzero-stochastic offline-reinforcement-learning online-reinforcement-learning pytorch resnetv2 rl stochastic-muzero transformer

Stochastic MuZero

Pytorch Implementation of Stochastic MuZero. Base on Muzero Unplugged.

It is suggested to refer to Stochastic MuZero as "unplugged," as setting the reanalyze_ratio to 0 is necessary to achieve Stochastic MuZero. This is because the original "Stochastic MuZero" paper highlights online reinforcement learning, however, as an enhancement to "MuZero Unplugged," it also encompasses offline reinforcement learning capabilities.

MuZero -> MuZero Unplugged -> Stochastic MuZero

*A scheduled update is planned for the release of PyTorch 2.1.

Table of contents

Getting started

Local Installation

PIP dependency : requirement.txt

git clone https://github.com/DHDev0/Stochastic-muzero.git

cd Stochastic-muzero

pip install -r requirements.txt

If you experience some difficulty refer to the first cell Tutorial or use the dockerfile.

Docker

Build image: (building time: 22 min , memory consumption: 8.75 GB)

docker build -t stochastic_muzero .

(do not forget the ending dot)

Start container:

docker run --cpus 2 --gpus 1 -p 8888:8888 stochastic_muzero
#or
docker run --cpus 2 --gpus 1 --memory 2000M -p 8888:8888 stochastic_muzero
#or
docker run --cpus 2 --gpus 1 --memory 2000M -p 8888:8888 --storage-opt size=15g stochastic_muzero

The docker run will start a jupyter lab on https://localhost:8888//lab?token=token (you need the token) with all the necessary dependency for cpu and gpu(Nvidia) compute.

Option meaning:
--cpus 2 -> Number of allocated (2) cpu core
--gpus 1 -> Number of allocated (1) gpu
--storage-opt size=15gb -> Allocated storage capacity 15gb (not working with windows WSL)
--memory 2000M -> Allocated RAM capacity of 2GB
-p 8888:8888 -> open port 8888 for jupyter lab (default port of the Dockerfile)

Stop the container:

docker stop $(docker ps -q --filter ancestor=stochastic_muzero)

Delete the container:

docker rmi -f stochastic_muzero

Dependency

Language :

Library :

More details at: requirement.txt

Usage

Jupyter Notebook

For practical example, you can use the Tutorial.

CLI

Set your config file (example): https://github.com/DHDev0/Stochastic-muzero/blob/main/config/

First and foremost cd to the project folder:

cd Stochastic-muzero

Construct your dataset through experimentation.

python muzero_cli.py human_buffer config/experiment_450_config.json

Training :

python muzero_cli.py train config/experiment_450_config.json

Training with report

python muzero_cli.py train report config/experiment_450_config.json

Inference (play game with specific model) :

python muzero_cli.py train play config/experiment_450_config.json

Training and Inference :

python muzero_cli.py train play config/experiment_450_config.json

Benchmark model :

python muzero_cli.py benchmark config/experiment_450_config.json

Training + Report + Inference + Benchmark :

python muzero_cli.py train report play benchmark play config/experiment_450_config.json

Features

Core Muzero and Muzero Unplugged features:

Muzero Stochastic new add-on features include:

TODO:

How to make your own custom gym environment?

Refer to the Gym documentation

You will be able to call your custom gym environment in muzero after you register it in gym.

Authors

Subjects

Deep reinforcement learning

License

GPL-3.0 license