Open etienne87 opened 8 years ago
I used a larger batch size (128) in my a3c implementation instead of 5 and it works quite well. I don't think there is any reason the batch size should be small. But that doesn't mean t_max should be large. With a large t_max, the training gets less stable in my experiments.
I don't understand how the batch can be large and t-max small? You need to accumulate frames during t-max steps before doing a backprop right?
In one backprop you can accumulate more frames from different simulators, but each simulator still only produces a 5-step temporal difference every time. Having a larger batch size should theoretically stabilize training.
Ah! Agreed. So like we should share a common replay memory to all threads and we recompute forward before large batch update?
The forward part doesn't really need to be recomputed. A delay in the target value is acceptable, similar to the idea of target network in DQN.
How can you run the project normally? I tried to run it but come out an error #12 . @ppwwyyxx @etienne87
Hello,
In the A3C paper they state t_max = 5, is there any reason you set it to 32?
Actually I don't really understand why the batch size should be so small, why shouldn't we use traditional batch sizes of 128 or more frames, shouldn't this make learning stronger?