gemcollector / RL-ViGen

This is the repo of "RL-ViGen: A Reinforcement Learning Benchmark for Visual Generalization"
MIT License
91 stars 11 forks source link

About parallel training #6

Open RayYoh opened 11 months ago

RayYoh commented 11 months ago

Hello, authors, Thanks for your excellent work, it is really helpful for the community. But I am confused about how can we achieve parallel training (it is actually not an issue of this repo). For example, the image size of dm_control suite is about 84*84, and when I train one exp, the used GPU memory is quite small but the GPU-Util is high. If I manually train several seeds at the same time, every training process will be slow. So my question is that how can I achieve parallel training to accelerate the process (multiprocessing?).

gemcollector commented 11 months ago

Hi, there! I apologize for the delayed response. In Visual RL, due to its need to learn representations of visual images, there is a significant reliance on the GPU, thus making high GPU utilization inevitable. A straightforward solution might be for you to run different seeds on multiple GPU cards, ensuring they don't interfere with each other, especially when CPU usage isn't high. The current bottleneck isn't in the interaction with the environment, so I think for Visual RL tasks, using multi-processing might not address the high GPU-util issue. Still, I appreciate your suggestion!