LuEE-C / PPO-Keras

My implementation of the Proximal Policy Optisation algorithm using Keras as a backend
88 stars 24 forks source link

get_batch waiting until complete episode? #5

Closed WillNichols726 closed 5 years ago

WillNichols726 commented 5 years ago

In get_batch, it looks like the function won't return until an episode is complete. I've got an environment with an extremely long run time (2e5 timesteps) and this is a huge bottleneck for me. Do you see any reason I can't just have my models update when the buffer gets filled?

WillNichols726 commented 5 years ago

Nevermind, I'm dumb. Forgot we needed to get the discounted reward

LuEE-C commented 5 years ago

Well you still can early stop it and use a forward window to approximate the discounted reward. So say you have a gamma value of .99, a batch size of 1024 and a forward window of 512, you would run your environment for 1536 steps, calculate your discounted rewards for all steps then discard the last 512 steps. Your discounted reward would be almost the same as if you ran it for all steps as .99 ** 512 ~0.5%. So while you do lose some experiences it is still favorable to running for a huge amount of time