Typo? - Githubissues

LuEE-C / PPO-Keras

My implementation of the Proximal Policy Optisation algorithm using Keras as a backend

88 stars 24 forks source link

Typo? #3

Open Khev opened 5 years ago

Khev commented 5 years ago

Hi there, thanks for sharing your code. I think there's an error on line 280 in main.py

_critic_loss = self.critic.fit([obs], [reward], batch_size=BATCHSIZE, shuffle=True, epochs=EPOCHS, verbose=False)

Shoudn't the critic be fitting to the discounted_returns instead of the rewards? That is the line should read

_critic_loss = self.critic.fit([obs], [discounted_returns], batch_size=BATCHSIZE, shuffle=True, epochs=EPOCHS, verbose=False)

LuEE-C commented 5 years ago

On line 204 we call the function self.transform_reward() which transforms the content of the reward array into the discounted reward, hope that clarifies

Khev commented 5 years ago

Ah ya, that makes sense. Thanks!

Khev commented 5 years ago

Also, I noticed you didn't use target networks for the critic. Did you observe any instability in the learning as a result? Just curious!