PaddlePaddle / PARL

A high-performance distributed training framework for Reinforcement Learning
https://parl.readthedocs.io/
Apache License 2.0
3.26k stars 820 forks source link

Does PARL support multi-GPU training in distributed RL setting? #613

Closed terryzhao127 closed 3 years ago

terryzhao127 commented 3 years ago

Hey, you guys! I am fully impressed with your concise codes for implementing a distributed RL algorithm, like A3C. And I am very interested in whether this framework supports training with multiple distributed learner (e.g. more than one GPU)! For example, train a PPO with 200 actors and 4 learners.

terryzhao127 commented 3 years ago

Another question: How do you balance the speed of production (Actor) and consumption (Learner) ? I didn't see any way to adjust it in A3C codes. The actor just keeps producing new training data: https://github.com/PaddlePaddle/PARL/blob/ab1eb893a9f0ac1d5238c6a5277ea7e7c6cd1fdf/examples/A2C/train.py#L117 And the learner keeps consuming from that message queue. https://github.com/PaddlePaddle/PARL/blob/ab1eb893a9f0ac1d5238c6a5277ea7e7c6cd1fdf/examples/A2C/train.py#L138 I suppose you guys balance that by changing the number of actors. Is it right?

TomorrowIsAnOtherDay commented 3 years ago

Hi guikarist, thanks for your interest in PARL.

It's ok to implement a multi-learner RL algorithm based on our framework. For the second question, there are two possible ways to adjust the number of samples: 1) adjust the number of actors (as you said) 2) adjust the number of steps to run for each actor (see actor.py)

If you would like to use multiple GPUs, you can reimplement agent.py with multiple GPUs for training.

terryzhao127 commented 3 years ago

Thanks for your reply! I think we need Horovod to implement a multi-learner training pattern with PARL.

TomorrowIsAnOtherDay commented 3 years ago

Horovod is an excellent framework for running multiple learners. I think PARL is compatible with it. Feel free to contact us if you have any problem :)