Asynchronous Methods for Deep Reinforcement Learning

norci commented 3 years ago

See: https://arxiv.org/abs/1602.01783 . It described a RL method without replay memory. such as n-step Q-learning, A3C.

findmyway commented 3 years ago

This needs a little bit of work on a general parameter server. I didn't aware of any implementations in Julia yet (correct me if I'm wrong).

norci commented 3 years ago

I think a parameter server is not necessary for A3C. there's a simple implementation: https://github.com/MorvanZhou/pytorch-A3C It uses multiple processes to run the agents. (multi-thread is not supported in Python?) The parameters of the global AC are put in shared memory, so the workers can sync with it.

It seems the only difference between A3C & multi-thread env + single A2C agent is, A3C update the parameters in each worker, and sync with the global parameters. A2C + multi-thread env update the parameters in batches.

I think A3C is suitable for distributated training, for complex env & model. A2C + multi-thread env should be used in single node. Julia has a very simple distributated computing interface. this should be very easy to implement.

findmyway commented 3 years ago

I think a parameter server is not necessary for A3C.

Yeah, I admit it's not a requirement. I mean a lot of things can be simpler if we have one 😆

The concepts are very simple, but to make this kind of algorithms work well across multiple machines, there're still a lot of things to do. Well, for the educational purpose, I think a workable example of A3C on one machine across multiple threads/processes is still meaningful.

By the way, you may also be interested in https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues/87

JuliaReinforcementLearning / ReinforcementLearning.jl

Asynchronous Methods for Deep Reinforcement Learning #142