deeplearninc / relaax

Reinforcement Learning framework to facilitate development and use of scalable RL algorithms and applications
Other
62 stars 10 forks source link

Distributed PPO with L-BFGS algorithm #12

Closed StanislavVolodarskiy closed 7 years ago

StanislavVolodarskiy commented 7 years ago

TODO: Distributed version of PPO - Proximal Policy Optimization (i.e., TRPO, but using a penalty instead of a constraint on KL divergence), where each subproblem is solved with L-BFGS

4SkyNet commented 7 years ago

Merged to v2 branch with trpo: 93261ef3433a95573308526ee27c9de4cea23a75