In SSE version of ShuffleChannel, the last channel is shuffled from the offset with the half of the granularity.
Here, it causes buffer-overflow at the load of the ptr1 at the last iteration of the for loop.
For example, in the case of (elempack == 4) && (_group == 2 && channels % _group != 0) in AVX512 optimization,
The ptr1 initially can be accessed for the range [ptr1, ptr1+4*size),
and the range is reduced into [ptr1, ptr1+4*size-2) after ptr1 += 2;.
However, at the last iteration of the for loop, it loads [ptr1+4*size, ptr1+4*(size+1)) to _p1, which leads to buffer-overflow.
Since it causes both buffer-overflow read (ptr1) and buffer-overflow write (outptr), it could lead to incorrect result of the model.
context
In SSE version of ShuffleChannel, the last channel is shuffled from the offset with the half of the granularity. Here, it causes buffer-overflow at the load of the
ptr1
at the last iteration of the for loop.For example, in the case of
(elempack == 4) && (_group == 2 && channels % _group != 0)
in AVX512 optimization,The
ptr1
initially can be accessed for the range[ptr1, ptr1+4*size)
, and the range is reduced into[ptr1, ptr1+4*size-2)
afterptr1 += 2;
. However, at the last iteration of the for loop, it loads[ptr1+4*size, ptr1+4*(size+1))
to_p1
, which leads to buffer-overflow.Since it causes both buffer-overflow read (
ptr1
) and buffer-overflow write (outptr
), it could lead to incorrect result of the model.x86 https://github.com/Tencent/ncnn/blob/master/src/layer/x86/shufflechannel_x86.cpp#L117 https://github.com/Tencent/ncnn/blob/master/src/layer/x86/shufflechannel_x86.cpp#L373 https://github.com/Tencent/ncnn/blob/master/src/layer/x86/shufflechannel_x86.cpp#L608
arm https://github.com/Tencent/ncnn/blob/master/src/layer/arm/shufflechannel_arm.cpp#L118 https://github.com/Tencent/ncnn/blob/master/src/layer/arm/shufflechannel_arm.cpp#L365 https://github.com/Tencent/ncnn/blob/master/src/layer/arm/shufflechannel_arm.cpp#L599
how to reproduce
more
I will open a PR of the patch for this:)