Open zwxandy opened 1 year ago
Hi!
I have run the F(2x2, 3x3) you have provided in this repo, and I get the definitely right result using Winograd convolution algorithm.
Then, I want to try F(4x4, 3x3), so I change three transform matrices and set m to 4. However, I found the result is wrong when input H/W is larger than 6. If you can provide me with F(4x4, 3x3) example, I will appreciate it very much! Thank you!
BTW, I think the number of tiles per channel T should be computed as
ceil((H_in - r + 1)/m)
in the code.
I have dived into the code, and I notice that the mistake comes from the overlap area. After carefully checking, in line 74, 75, for F(2x2, 3x3), r - 1
is right because the stride is just 2. But for F(4x4, 3x3), r - 1
is not right because the stride of F(4x4, 3x3) is 4, which equals to m (output tile size). So, I think this line should be modified to vH = tH * m
instead of vH = tH * (r - 1)
.
Hi!
I have run the F(2x2, 3x3) you have provided in this repo, and I get the definitely right result using Winograd convolution algorithm.
Then, I want to try F(4x4, 3x3), so I change three transform matrices and set m to 4. However, I found the result is wrong when input H/W is larger than 6. If you can provide me with F(4x4, 3x3) example, I will appreciate it very much! Thank you!
BTW, I think the number of tiles per channel T should be computed as
ceil((H_in - r + 1)/m)
in the code.