The last channel handling logic in the SSE version of ShuffleChannel cause buffer-overflow

context

In SSE version of ShuffleChannel, the last channel is shuffled from the offset with the half of the granularity. Here, it causes buffer-overflow at the load of the ptr1 at the last iteration of the for loop.

For example, in the case of (elempack == 4) && (_group == 2 && channels % _group != 0) in AVX512 optimization,

The ptr1 initially can be accessed for the range [ptr1, ptr1+4*size), and the range is reduced into [ptr1, ptr1+4*size-2) after ptr1 += 2;. However, at the last iteration of the for loop, it loads [ptr1+4*size, ptr1+4*(size+1)) to _p1, which leads to buffer-overflow.

Since it causes both buffer-overflow read (ptr1) and buffer-overflow write (outptr), it could lead to incorrect result of the model.

{
      const float* ptr0 = bottom_blob.channel(channels_per_group);
      const float* ptr1 = bottom_blob.channel(channels_per_group * 2);
      float* outptr = top_blob.channel(channels_per_group * 2);

      ptr1 += 2;

      for (int i = 0; i < size; i++)
      {
          __m128 _p0 = _mm_loadu_ps(ptr0);
          __m128 _p1 = _mm_loadu_ps(ptr1);

          __m128 _lo = _mm_unpacklo_ps(_p0, _p1);

          _mm_storeu_ps(outptr, _lo);

          ptr0 += 4;
          ptr1 += 4;
          outptr += 4;
      }
  }

x86 https://github.com/Tencent/ncnn/blob/master/src/layer/x86/shufflechannel_x86.cpp#L117 https://github.com/Tencent/ncnn/blob/master/src/layer/x86/shufflechannel_x86.cpp#L373 https://github.com/Tencent/ncnn/blob/master/src/layer/x86/shufflechannel_x86.cpp#L608

arm https://github.com/Tencent/ncnn/blob/master/src/layer/arm/shufflechannel_arm.cpp#L118 https://github.com/Tencent/ncnn/blob/master/src/layer/arm/shufflechannel_arm.cpp#L365 https://github.com/Tencent/ncnn/blob/master/src/layer/arm/shufflechannel_arm.cpp#L599

how to reproduce

Build with SSE in x86 or arm
./test_shufflechannel

I will open a PR of the patch for this:)

Tencent / ncnn

The last channel handling logic in the SSE version of ShuffleChannel cause buffer-overflow #5734

context

how to reproduce

more