ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
2.87k stars 783 forks source link

NEConvolution3x3 function output not correct #833

Closed smiglan closed 4 years ago

smiglan commented 4 years ago

'Output of 'strings libarm_compute.so | grep arm_compute_version' arm_compute_version=v20.05 Build options: {'pmu': '1', 'estate': 'auto', 'embed_kernels': '1', 'arch': 'armv7a', 'opencl': '0', 'neon': '1', 'build_dir': 'linux-armv7a-neon-debug', 'debug': '1', 'standalone': '0', 'extra_link_flags': '', 'validation_tests': '0', 'examples': '1', 'asserts': '1', 'mali': '1', 'benchmark_tests': '0', 'compiler_cache': 'ccache', 'extra_cxx_flags': '', 'os': 'linux', 'Werror': '1', 'benchmark_examples': '1'} Git hash=6a7771e460abeac7d401d6d38a0fcf0a0d2c3cbe

Platform: Raspberry Pi 3 Model B

Operating System: "Raspbian GNU/Linux 9 (stretch)"

Problem description: I am trying a simple NEConvolution3x3 function and the output after the convolution is the same as the input

The functionality i am trying to achieve is:

  1. Create an array and fill it with some dummy data
  2. Convert the array to a tensor
  3. Perform convolution on the Tensor with a 3x3 filter

The input tensor (with dummy data) is: 0 17 34 51 68 85 13 30 47 64 81 98 26 43 60 77 94 111 39 56 73 90 107 124 52 69 86 103 120 137 65 82 99 116 133 150 The kernel with which the convolution operation is getting performed: 1 2 1 2 4 2 1 2 1 The output is: 30 47 64 81 43 60 77 94 56 73 90 107 69 86 103 120

Please find the attached code for the problem https://github.com/smiglan/Arm-Compute/blob/master/neon_convolution

I am just starting out with the arm compute library and C++ and would appreciate any help for the issue. Thanx.

morgolock commented 4 years ago

Hi @smiglan

Your code looks good, would you please try changing BorderMode::UNDEFINED to BorderMode::CONSTANT ?

What's the golden/reference output?

Hope this helps.

smiglan commented 4 years ago

Hello @morgolock

Changing Undefined to Constant has the same problem as well. The output is 5 16 28 41 54 47 14 30 47 64 81 69 23 43 60 77 94 79 33 56 73 90 107 88 43 69 86 103 120 98 37 58 71 83 96 78

Only the outside border values change, but the middle sub array is the same as the input matrix. The output as per matlab function conv2(input,kernel) with zero and same padding is(matches the hand calculation as well)

90 256 460 664 868 753 224 480 752 1024 1296 1108 380 688 960 1232 1504 1264 536 896 1168 1440 1712 1420 692 1104 1376 1648 1920 1576 597 932 1136 1340 1544 1260

I also wrote a similar example for NEGaussian3x3 and i have the same problem

The input tensor (with dummy data) is: 0 17 34 51 68 85 13 30 47 64 81 98 26 43 60 77 94 111 39 56 73 90 107 124 52 69 86 103 120 137 65 82 99 116 133 150

The output is: 7 20 37 54 71 84 17 30 47 64 81 93 30 43 60 77 94 106 43 56 73 90 107 119 56 69 86 103 120 132 66 78 95 112 129 142

As you can see, the middle subarray remains the same

The correct output should be: 2.8549 16.5740 31.7634 46.9527 62.1421 67.4774 13.4260 30.0000 47.0000 64.0000 81.0000 85.7517 25.0414 43.0000 60.0000 77.0000 94.0000 97.3671 36.6568 56.0000 73.0000 90.0000 107.0000 108.9825 48.2723 69.0000 86.0000 103.0000 120.0000 120.5979 52.2721 71.8818 87.0712 102.2606 117.4500 116.8946

The reference code is here: https://github.com/smiglan/Arm-Compute/blob/master/neon_gaussian

As there is a problem in both functions, i am assuming, the problem might be in how i am converting my dummy c++ array to a tensor. But when i print the tensor, its output is the same as my array. So i am not sure what the issue might be

Thanx

morgolock commented 4 years ago

Hi @smiglan

Could you please replace NEConvolution by NEGaussian3x3 ?

https://github.com/ARM-software/ComputeLibrary/blob/master/arm_compute/runtime/NEON/functions/NEGaussian3x3.h

Hope this helps.

morgolock commented 4 years ago

Hi @smiglan

Your code is okay and the output is correct.

Take for example the element (1,1) which is 30 both in input and output

0   17 34
13  30 47
30  43 60

the correct output is:

0 *1 + 17*2 + 34*1 +13*2 + 30*4 + 47*2 + 30*1 +43*2 +60*1 = 30.25 ~ 30

So the gaussian kernel is actually computing the correct result.

For more information please take a look at: https://www.khronos.org/registry/OpenVX/specs/1.0/html/d6/d58/group__group__vision__function__gaussian__image.html

You could also try randomizing your values to see better results and you will notice the output differs more markedly from the input in this case:

 19       for(unsigned int h = 0; h < height; h++)
 20             {
 21                 for(unsigned int w = 0; w < width; w++)
 22                 {
 23                     src_data[ h * width + w] = static_cast<uint8_t>(rand()%256);
 24                 }
 25             }

Please reopen if you need more help