YYCAAA / V-MPO_Lunarlander

Simple implementation of V-MPO proposed in https://arxiv.org/abs/1909.12238
MIT License
44 stars 6 forks source link

VMPO-PyTorch

Minimal PyTorch implementation of V-MPO: On-Policy Maximum a Posteriori Policy Optimization for OpenAI gym environments.

Modified from nikhilbarhate99/PPO-PyTorch

Usage

Dependencies

Trained and tested on:

Python 3.6
PyTorch 1.0
NumPy 1.15.3
gym 0.10.8
Pillow 5.3.0

References