Closed terryzhao127 closed 3 years ago
Another question: How do you balance the speed of production (Actor) and consumption (Learner) ? I didn't see any way to adjust it in A3C codes. The actor just keeps producing new training data: https://github.com/PaddlePaddle/PARL/blob/ab1eb893a9f0ac1d5238c6a5277ea7e7c6cd1fdf/examples/A2C/train.py#L117 And the learner keeps consuming from that message queue. https://github.com/PaddlePaddle/PARL/blob/ab1eb893a9f0ac1d5238c6a5277ea7e7c6cd1fdf/examples/A2C/train.py#L138 I suppose you guys balance that by changing the number of actors. Is it right?
Hi guikarist, thanks for your interest in PARL.
It's ok to implement a multi-learner RL algorithm based on our framework. For the second question, there are two possible ways to adjust the number of samples: 1) adjust the number of actors (as you said) 2) adjust the number of steps to run for each actor (see actor.py)
If you would like to use multiple GPUs, you can reimplement agent.py with multiple GPUs for training.
Thanks for your reply! I think we need Horovod to implement a multi-learner training pattern with PARL.
Horovod is an excellent framework for running multiple learners. I think PARL is compatible with it. Feel free to contact us if you have any problem :)
Hey, you guys! I am fully impressed with your concise codes for implementing a distributed RL algorithm, like A3C. And I am very interested in whether this framework supports training with multiple distributed learner (e.g. more than one GPU)! For example, train a PPO with 200 actors and 4 learners.