This repository contains a re-implementation of the Proximal Policy Optimization (PPO) algorithm, originally sourced from Stable-Baselines3.
The purpose of this re-implementation is to provide insight into the inner workings of the PPO algorithm in these environments:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install stable-baselines3[extra]==2.2.1
pip install swig
pip install gymnasium
pip install gymnasium[box2d]
main.py
as you wish (LunarLander-v2
/ CartPole-v1
)python main.py
python test.py
(as of now, running the test script will load my best model for both LunarLander-v2 and CartPole-v1)python main.py --game 'LunarLander-v2'
)python main.py --game 'LunarLander-v2' --model 'model.pt'
)CarRacing-v2
environmentThis repository includes parts of code that has been adapted from the Stable Baselines library (https://github.com/DLR-RM/stable-baselines3) for educational purposes only. The original code is the property of its respective owners and is subject to their licensing terms.
I do not claim any ownership, copyright, or proprietary rights over the code obtained from Stable Baselines. The use of this code in this repository is solely for educational and learning purposes, and any commercial use or distribution is subject to the original licensing terms provided by Stable Baselines.
The original Stable Baselines code is licensed under the MIT License, and any use of their code in this repository is also subject to the terms of the MIT License.