Maratyszcza / NNPACK

Acceleration package for neural networks on multi-core CPUs
BSD 2-Clause "Simplified" License
1.67k stars 317 forks source link

What is the rational behind using F(6x6, 3x3) for Winograd tile size? #122

Closed ghost closed 6 years ago

ghost commented 6 years ago

Could you please explain what is the rational behind choosing the current tile size for 2D Convolutions for Winograd F(6x6, 3x3). In the original publication on Winograd transformation, the tile sizes chosen are smaller F(2x2, 3x3) on GPUs. CPUs has more memory to deal with bigger tile sizes. But, are there any performance advantages for smaller filter sizes such as 3x3?

Maratyszcza commented 6 years ago

For a kernel of size KxK and Winograd tile of size TxT, we have to do TxT multiply-adds per each channel, but we get only (T-K+1)x(T-K+1) outputs. Thus, the larger the tile, the fewer multiply-adds we do per each output. E.g. for F(6x6, 3x3) we do accumulations for 8x8 tiles (64 elements), and then transform them into 6x6 output tiles (36 elements). This reduces efficiency of Winograd from theoretical 1 multiply-add per output to ~1.78 multiply-adds per output. If NNPACK used F(2x2, 3x3) tiles, it would do 4x4 / 2x2 = 4 multiply-adds per output. Increasing Winograd tile even further beyond 8x8 would provide additional savings in computations, but results would get drastically less accurate.