dbolya / yolact

A simple, fully convolutional model for real-time instance segmentation.
MIT License
4.97k stars 1.32k forks source link

Will DDP improve gpu utilization? #438

Open voidrank opened 4 years ago

voidrank commented 4 years ago

Hi @dbolya ,

I'm training resnet50 on 4gpus. The gpu utilization is very low. However, when I train it on 1 gpu, the gpu utilization can be up to 60%. I'm planing to do distributed data parallel to improve gpu utilization. What do you suggest? Do you think DDP is a feasible way to improve training speed?

voidrank commented 4 years ago

I applied DDP and found significant boost of training speed and gpu utilization. Hope you and other guys who are facing speed issues know about this.

feiyuhuahuo commented 4 years ago

@voidrank Do you know why DDP runs faster than DP? Any instructions? Thanks.

voidrank commented 4 years ago

@feiyuhuahuo The reason is a little bit complicated. DDP uses multiprocess while DP uses multithread. The implementation of python multithread is not perfect, for example, use a global lock across different threads. This notorious issue degrade the efficiency. However, python multiprocess doesn't have this issue, so DDP may have higher gpu utilization and better efficiency. You can google distributed data parallel, there are lot of introductions online.