LuEE-C / PPO-Keras

My implementation of the Proximal Policy Optisation algorithm using Keras as a backend
88 stars 24 forks source link

get_reward #1

Closed Krsnadeva closed 6 years ago

Krsnadeva commented 6 years ago

Hey OctThe16th! I've been looking for a good implementation of the PPO in Keras (I am not very 'math notation' savvy, so it is hard for me to follow straight from the papers), I think you nailed it, but I failed to understand your call to:

`136               r = self.get_reward(i, len(tmp_batch[0]))`

since _getreward does not seem to be implemented. What am I missing?

I am trying a PPO to run in a real time environment of my own, not Gym. I already have other agents running (vanilla PG), but I could not figure out a way to compile the model, so I was using a K.function for training, and I think because of that my layer.get_weights() are returning empty. I do now, thanks to you! Cheers.

LuEE-C commented 6 years ago

Oops, you're right, just pushed a fix and a few changes, hope that helps!

Krsnadeva commented 6 years ago

Awesome Luis! Thanks, will check it out.

Cheers and a great new year!

From: Louis Clouâtre Sent: Friday, December 29, 2017 6:28 PM To: OctThe16th/PPO-Keras Cc: Claudio; Author Subject: Re: [OctThe16th/PPO-Keras] get_reward (#1)

Oops, you're right, just pushed a fix and a few changes, hope that helps! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.