Closed StanislavVolodarskiy closed 7 years ago
TODO: Distributed version of PPO - Proximal Policy Optimization (i.e., TRPO, but using a penalty instead of a constraint on KL divergence), where each subproblem is solved with L-BFGS
Merged to v2 branch with trpo: 93261ef3433a95573308526ee27c9de4cea23a75
v2
trpo
TODO: Distributed version of PPO - Proximal Policy Optimization (i.e., TRPO, but using a penalty instead of a constraint on KL divergence), where each subproblem is solved with L-BFGS