CpuGemmConv2d operator has incorrect setting for INT8_NHWC Data.

Hi!

I have just configured the NEConvolutionLayer with NHWC_QASYMM8_SIGNED Tensor, and I have found there are two issues here:

All the INT8 NHWC Convolution Operator would not skip reshaping kernel after GEMM, which does not make any sense, as GEMMOUTPUT equals to the output tensor of layer in terms of continuous data within memory under the layout of NHWC.
1x1 ConvolutionKernel would still call Im2Col kernel, which does not make sense. 1x1 Kernel under NHWC layout should be simple GEMM after weight reshaping.

The previous two problems are caused by skip_im_col_info() function in CpuGemmConv2d.cpp, and the actual problem is in the lower part of CpuGemmLowpMatrixMultiplyCore::Validate().

3*.The reshaping kernel itself has been poorly implemented under NHWC layout in kernel level, as the value assignment is implemented value by value, which could be easily optimized with MEMCPY() row by row, just simply setting WindowDim::x as (0,1,1) and set the corresponding iterator.

I have modified this problem myself in CpuGemmConv2d.cpp and I have tested with random numbers, the result is mathematically correct which I have verified, I suppose this issue should be fixed, otherwise the performance for this setting would be greatly affected.

ARM-software / ComputeLibrary

CpuGemmConv2d operator has incorrect setting for INT8_NHWC Data. #1039