NervanaSystems / neon

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware
http://neon.nervanasys.com/docs/latest
Apache License 2.0
3.87k stars 811 forks source link

More accurate Winograd/Cook/Toom F(4x4, 3x3) transforms #224

Closed andravin closed 8 years ago

andravin commented 8 years ago

Here are some more accurate transform matrices for the Winograd F(4x4, 3x3) kernel.

They are about 4X as accurate as the original transforms, and within a factor of 2X of the accuracy of direct convolution.

AT (the inverse transform) requires a couple more flops to compute than the original transform, because some of the values of +/-1 were replaced with rational numbers. I think the other transforms are the same number of flops.


AT =
⎡1   1      1     1      1    0⎤
⎢                              ⎥
⎢0  7/10  -7/10  3/2   -3/2   0⎥
⎢                              ⎥
⎢    49     49                 ⎥
⎢0  ───    ───   9/4    9/4   0⎥
⎢   100    100                 ⎥
⎢                              ⎥
⎢   343   -343                 ⎥
⎢0  ────  ─────  27/8  -27/8  1⎥
⎣   1000   1000                ⎦

G =
⎡ 400              ⎤
⎢ ───     0     0  ⎥
⎢ 441              ⎥
⎢                  ⎥
⎢-625   -125   -25 ⎥
⎢─────  ─────  ────⎥
⎢ 1078   308    88 ⎥
⎢                  ⎥
⎢-625    125   -25 ⎥
⎢─────   ───   ────⎥
⎢ 1078   308    88 ⎥
⎢                  ⎥
⎢  25     25    25 ⎥
⎢ ───    ───    ── ⎥
⎢ 198    132    88 ⎥
⎢                  ⎥
⎢  25   -25     25 ⎥
⎢ ───   ────    ── ⎥
⎢ 198   132     88 ⎥
⎢                  ⎥
⎣  0      0     1  ⎦

BT =
⎡441         -137              ⎤
⎢───    0    ─────    0    1  0⎥
⎢400           50              ⎥
⎢                              ⎥
⎢     -63                      ⎥
⎢ 0   ────   -9/4   7/10   1  0⎥
⎢      40                      ⎥
⎢                              ⎥
⎢      63                      ⎥
⎢ 0    ──    -9/4   -7/10  1  0⎥
⎢      40                      ⎥
⎢                              ⎥
⎢     -147   -49               ⎥
⎢ 0   ─────  ────    3/2   1  0⎥
⎢      200   100               ⎥
⎢                              ⎥
⎢      147   -49               ⎥
⎢ 0    ───   ────   -3/2   1  0⎥
⎢      200   100               ⎥
⎢                              ⎥
⎢      441          -137       ⎥
⎢ 0    ───     0    ─────  0  1⎥
⎣      400            50       ⎦
andravin commented 8 years ago

This transform appears to be the same or slightly more accurate:

AT =
⎡1   1    1     1      1    0⎤
⎢                            ⎥
⎢   √2   -√2                 ⎥
⎢0  ──   ────   √2    -√2   0⎥
⎢   2     2                  ⎥
⎢                            ⎥
⎢0  1/2  1/2    2      2    0⎥
⎢                            ⎥
⎢   √2   -√2                 ⎥
⎢0  ──   ────  2⋅√2  -2⋅√2  1⎥
⎣   4     4                  ⎦

G =
⎡ 1     0     0  ⎤
⎢                ⎥
⎢      -√2       ⎥
⎢-2/3  ────  -1/3⎥
⎢       3        ⎥
⎢                ⎥
⎢       √2       ⎥
⎢-2/3   ──   -1/3⎥
⎢       3        ⎥
⎢                ⎥
⎢       √2       ⎥
⎢1/6    ──   1/3 ⎥
⎢       6        ⎥
⎢                ⎥
⎢      -√2       ⎥
⎢1/6   ────  1/3 ⎥
⎢       6        ⎥
⎢                ⎥
⎣ 0     0     1  ⎦

BT =
⎡1   0    -5/2   0    1  0⎤
⎢                         ⎥
⎢                √2       ⎥
⎢0  -√2    -2    ──   1  0⎥
⎢                2        ⎥
⎢                         ⎥
⎢               -√2       ⎥
⎢0   √2    -2   ────  1  0⎥
⎢                2        ⎥
⎢                         ⎥
⎢   -√2                   ⎥
⎢0  ────  -1/2   √2   1  0⎥
⎢    2                    ⎥
⎢                         ⎥
⎢    √2                   ⎥
⎢0   ──   -1/2  -√2   1  0⎥
⎢    2                    ⎥
⎢                         ⎥
⎣0   1     0    -5/2  0  1⎦

I had posted a variation of these matrices earlier that scaled some of the columns of AT and rows of G, but that seemed to hurt accuracy slightly.

scott-gray commented 8 years ago

This one is done is pending merge for release.

buttercutter commented 5 years ago

Could anyone advise how to obtain this improved version of winograd convolution from the original version which is described by equation (7) in the paper ?

andravin commented 5 years ago

Hi @promach

I generated the above transforms using an early version of the winCNN software, now available here: https://github.com/andravin/wincnn

Matrix AT is a Vandermonde matrix, and the "polynomial roots" used by winCNN are just the numbers in the second row, leaving off the last column.

Since these matrices were released, others have researched possibly better transforms. Please refer to https://github.com/dmlc/tvm/pull/3553

buttercutter commented 5 years ago

@andravin

I do not understand why the proposed toom-cook ALGORITHM 1 in the paper Error Analysis and Improving the Accuracy of Winograd Convolution for Deep Neural Networks does not need polynomial interpolation stage ?

buttercutter commented 4 years ago

Matrix AT is a Vandermonde matrix, and the "polynomial roots" used by winCNN are just the numbers in the second row, leaving off the last column.

@andravin What do you exactly mean by 'polynomial roots' ? and why are they located in the second row ?