chainer / chainermn

ChainerMN: Scalable distributed deep learning with Chainer
https://chainer.org
MIT License
207 stars 57 forks source link

Reduce CUDA kernel launch in BN #276

Closed okuta closed 6 years ago

okuta commented 6 years ago

Currently MultiNodeBatchNormalizationFunction calls 4 multiply and 2 add operations. This PR unifies it kernel launch.

keisukefukuda commented 6 years ago

Replaced by #282