Question about the result of F(4x4, 3x3).

Hi!

I have run the F(2x2, 3x3) you have provided in this repo, and I get the definitely right result using Winograd convolution algorithm.

Then, I want to try F(4x4, 3x3), so I change three transform matrices and set m to 4. However, I found the result is wrong when input H/W is larger than 6. If you can provide me with F(4x4, 3x3) example, I will appreciate it very much! Thank you!

BTW, I think the number of tiles per channel T should be computed as ceil((H_in - r + 1)/m) in the code.

I have dived into the code, and I notice that the mistake comes from the overlap area. After carefully checking, in line 74, 75, for F(2x2, 3x3), r - 1 is right because the stride is just 2. But for F(4x4, 3x3), r - 1 is not right because the stride of F(4x4, 3x3) is 4, which equals to m (output tile size). So, I think this line should be modified to vH = tH * m instead of vH = tH * (r - 1).

adam-dziedzic / winograd

Question about the result of F(4x4, 3x3). #1