Open voidrank opened 4 years ago
I applied DDP and found significant boost of training speed and gpu utilization. Hope you and other guys who are facing speed issues know about this.
@voidrank Do you know why DDP runs faster than DP? Any instructions? Thanks.
@feiyuhuahuo The reason is a little bit complicated. DDP uses multiprocess while DP uses multithread. The implementation of python multithread is not perfect, for example, use a global lock across different threads. This notorious issue degrade the efficiency. However, python multiprocess doesn't have this issue, so DDP may have higher gpu utilization and better efficiency. You can google distributed data parallel, there are lot of introductions online.
Hi @dbolya ,
I'm training resnet50 on 4gpus. The gpu utilization is very low. However, when I train it on 1 gpu, the gpu utilization can be up to 60%. I'm planing to do distributed data parallel to improve gpu utilization. What do you suggest? Do you think DDP is a feasible way to improve training speed?