coreylynch / async-rl

Tensorflow + Keras + OpenAI Gym implementation of 1-step Q Learning from "Asynchronous Methods for Deep Reinforcement Learning"
MIT License
1.01k stars 174 forks source link

t_max = 32 #7

Open etienne87 opened 8 years ago

etienne87 commented 8 years ago

Hello,

In the A3C paper they state t_max = 5, is there any reason you set it to 32?

Actually I don't really understand why the batch size should be so small, why shouldn't we use traditional batch sizes of 128 or more frames, shouldn't this make learning stronger?

ppwwyyxx commented 8 years ago

I used a larger batch size (128) in my a3c implementation instead of 5 and it works quite well. I don't think there is any reason the batch size should be small. But that doesn't mean t_max should be large. With a large t_max, the training gets less stable in my experiments.

etienne87 commented 8 years ago

I don't understand how the batch can be large and t-max small? You need to accumulate frames during t-max steps before doing a backprop right?

ppwwyyxx commented 8 years ago

In one backprop you can accumulate more frames from different simulators, but each simulator still only produces a 5-step temporal difference every time. Having a larger batch size should theoretically stabilize training.

etienne87 commented 8 years ago

Ah! Agreed. So like we should share a common replay memory to all threads and we recompute forward before large batch update?

ppwwyyxx commented 8 years ago

The forward part doesn't really need to be recomputed. A delay in the target value is acceptable, similar to the idea of target network in DQN.

LinZichuan commented 8 years ago

How can you run the project normally? I tried to run it but come out an error #12 . @ppwwyyxx @etienne87