chainer / chainermn

ChainerMN: Scalable distributed deep learning with Chainer
https://chainer.org
MIT License
207 stars 57 forks source link

Non-Blocking Methodology on ChainerMN #291

Closed fengyuan14 closed 6 years ago

fengyuan14 commented 6 years ago

Hello, we took some experiments about Non-Blocking methodology on ChainerMN. The methodology simply is like, <Iter-(n)> (Layer-1) wait for async-allreduce from last iter -> (Layer-1) forward computation -> (Layer-2) wait for async-allreduce from last iter -> (Layer-2) forward computation -> ... ... -> (Layer-2) backward computation -> (Layer-2) send async-allreduce request -> (Layer-1) backward computation -> (Layer-1) send async-allreduce request -> <Iter-(n+1)> (Layer-1) wait for async-allreduce from last iter -> (Layer-1) forward computation -> (Layer-2) wait for async-allreduce from last iter -> (Layer-2) forward computation -> ...

Compared with Blocking one (existing methodology), we got a significant improvement on Non-Blocking methodology. Here is data on Resnet50, Test Environment: 16 nodes (Intel skx-8180), 128 batch size, 10GB bandwidth, IMPI. Blocking scalability is 66.72%. Non-Blocking scalability is 92.4%. Scalability calculation: iterations-per-sec-on-MultiNode / iterations-per-sec-on-SingleNode.

Have you got any plans to implement Non-Blocking scalability? Or we can show a patch and discuss more of it ?

keisukefukuda commented 6 years ago

@arthuryuan1987 , thanks you for raising an issue, and thank you for your contribution!

It's great to hear that you implemented a layer-wise computation-communication overlap feature and obtained a significant performance improvement.

Actually, the feature is in our scope, but we have no concrete short-term plan. This is because the overlapping feature is not straightforward as it looks.

As you may know, ChainerMN already has a "double-buffering" feature. Although it comes with a cost of accuracy degradation, it is general and can work with any network.

In fact, ChainerMN is integrated into Chainer (see https://github.com/chainer/chainer/pull/5226), and will be more tightly integrated with Chainer. There may be more opportunities to support such a corner cases with more advanced Chainer features and we will start considering the feature.

Thanks,

fengyuan14 commented 6 years ago

Thanks for your elaborate explanation. We take deep dive for a completed view. Thanks a lot.

keisukefukuda commented 6 years ago

I'm closing the issue for now. Feel free to reopen it if you like to have more discussion on this. Thanks!