Winograd F(4x4, 5x5) - Githubissues

Here are transforms for Winograd F(4x4, 5x5). That means a 5x5 kernel with a 4x4 output tile. I imagine the code should be a relatively simple adaptation of F(6x6, 3x3), because both algorithms use 8x8 input tiles.

It uses 8x8/(4x4)=4 multiplies / output which gives it a max theoretical speedup of 5x5/4 = 6.25.

AT =
⎡1  1  1   1  1   8  8   0⎤
⎢                         ⎥
⎢0  1  -1  2  -2  4  -4  0⎥
⎢                         ⎥
⎢0  1  1   4  4   2  2   0⎥
⎢                         ⎥
⎣0  1  -1  8  -8  1  -1  1⎦

G =
⎡ 1      0     0      0      0  ⎤
⎢                               ⎥
⎢-2/9  -2/9   -2/9  -2/9   -2/9 ⎥
⎢                               ⎥
⎢-2/9   2/9   -2/9   2/9   -2/9 ⎥
⎢                               ⎥
⎢1/90  1/45   2/45  4/45   8/45 ⎥
⎢                               ⎥
⎢1/90  -1/45  2/45  -4/45  8/45 ⎥
⎢                               ⎥
⎢4/45  2/45   1/45  1/90   1/180⎥
⎢                               ⎥
⎢4/45  -2/45  1/45  -1/90  1/180⎥
⎢                               ⎥
⎣ 0      0     0      0      1  ⎦

BT =
⎡1   0    -21/4    0    21/4     0    -1  0⎤
⎢                                          ⎥
⎢0   1      1    -17/4  -17/4    1    1   0⎥
⎢                                          ⎥
⎢0   -1     1    17/4   -17/4   -1    1   0⎥
⎢                                          ⎥
⎢0  1/2    1/4   -5/2   -5/4     2    1   0⎥
⎢                                          ⎥
⎢0  -1/2   1/4    5/2   -5/4    -2    1   0⎥
⎢                                          ⎥
⎢0   2      4    -5/2    -5     1/2   1   0⎥
⎢                                          ⎥
⎢0   -2     4     5/2    -5    -1/2   1   0⎥
⎢                                          ⎥
⎣0   -1     0    21/4     0    -21/4  0   1⎦

Maratyszcza / NNPACK

Winograd F(4x4, 5x5) #12