daisatojp / mpo

PyTorch Implementation of the Maximum a Posteriori Policy Optimisation
GNU General Public License v3.0
70 stars 19 forks source link

Make more informative #8

Closed daisatojp closed 3 years ago

daisatojp commented 3 years ago

Currently I think this code is not useful for MPO learner. Many part of the code can be misleading about the correspondences between the theory and the implementation. Fix this.