daisatojp / mpo

PyTorch Implementation of the Maximum a Posteriori Policy Optimisation
GNU General Public License v3.0
70 stars 19 forks source link

Retrace Algotithm #1

Closed daisatojp closed 4 years ago

daisatojp commented 4 years ago

use Retrace Algorithm (paper) as Policy Evaluation

daisatojp commented 4 years ago

i observed it is unstable, fail to implement?

daisatojp commented 4 years ago

I gave up.