Closed kikoland closed 6 years ago
If output_channels_subblock_size <= 2
, then kernel20
and kernel21
are never initialized. They are used in accumulating output2
, but output2
is not written to the output. This is an optimization to avoid initializing kernel20
/kernel21
or having extra conditionals in the code. It relies on undefined behavior, so a compiler could generate wrong code if it tries to be too smart.
Thanks! not sure to fully understand how a standard compiler reacts to that. It crashes with visual studio internal compiler (but the fix seems to work and that's enough for my own tests).
Probably less costly fix would be to zero-initialize all kernelXY
variables.
Hi and thanks for your great library.
I've been playing with Pixelwise (1x1) convolutions.
It seems there is an issue when block size (for output) are not multiple of 4. This happens for example when building logits at the end of an inference (input 512xWxH, output 2xWxH).
Here is an illustration of the problematic call to the function (scalar case, I didn't explore simd or others yet):
Then, kernel20 (and kernel30) is not initialized but used later in:
Moreover output2 and output3 will access wrong memory addresses.
Would the following fix solve the issue? Not very efficient though...