Closed alvoron closed 6 months ago
Let me provide parameters:
Precision: FP32
Source tensor shape: [2,2,2,2]
Source tensor: 5 4 1 4 9 2 5 6 7 8 1 6 1 8 1 6
Indices: 0 1
Source tensor shape: [1,1,2,2]
Output tensor provided by ACL: 3.5 4 5.5 5.5
Expected output tensor: 5.5 5.5 2 5.5
according to numpy:
data = np.array([[[[5,4],[1,4]],[[9,2],[5,6]]],[[[7,8],[1,6]],[[1,8],[1,6]]]],dtype=np.float32)
axes = np.array([0,1], dtype=np.int64)
print(np.mean(data, axis=tuple(axes), keepdims=True))
NEReduceMean gives correct results if indices are 1 2 3
(result is 4.5 4.75
) however it gives incorrect results for indices 0 1 3
: ACL result is 3.75 5.5
, numpy result is 5.5 3.75
.
The issue is reproducible on Raspberry Pi as well.
Hi @alvoron
I've just tried the shapes and data you shared with tensorflow and I get the same results as ACL 3.75 5.5
for axis 0
>>> x = tf.constant([ [5. ,4.] , [1., 4.] , [ 9., 2.], [5., 6.] , [7. , 8.] , [ 1., 6.] , [1., 8.], [1., 6.]])
>>> tf.reduce_mean(x,0)
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([3.75, 5.5 ], dtype=float32)>
>>> tf.reduce_mean(x,1)
<tf.Tensor: shape=(8,), dtype=float32, numpy=array([4.5, 2.5, 5.5, 5.5, 7.5, 3.5, 4.5, 3.5], dtype=float32)>
It would help if you could share a standalone test reproducing the problem, see the test below
1 #include "arm_compute/core/Types.h"
2 #include "arm_compute/runtime/NEON/NEFunctions.h"
3 #include "utils/Utils.h"
4 #include "tests/SimpleTensor.h"
5 #include "arm_compute/runtime/Tensor.h"
6 #include "utils/TypePrinter.h"
7 using namespace std;
8 using namespace arm_compute;
9 using namespace arm_compute::test;
10
11
12 int main()
13 {
14 NEReduceMean reduce_f;
15 Tensor inputt;
16 Tensor outputt;
17 inputt.allocator()->init(TensorInfo(TensorShape(2, 2, 2, 2), 1, DataType::F32));
18 float data[] = { 5., 4., 1. , 4. , 9., 2. , 5., 6., 7., 8., 1., 6., 1., 8., 1., 6. };
19 inputt.allocator()->import_memory(data);
20 Coordinates axis(0);
21 std::cout << "axis " << axis << std::endl;
22 reduce_f.configure(&inputt,axis, true, &outputt);
23 outputt.allocator()->allocate();
24 cout << "input " << std::endl;
25 inputt.print(std::cout);
26 reduce_f.run();
27 cout << "\noutput " << std::endl;
28 outputt.print(std::cout);
29 }
When executed
LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./mean
axis 0
input
5 4
1 4
9 2
5 6
7 8
1 6
1 8
1 6
output
4.5
2.5
5.5
5.5
7.5
3.5
4.5
3.5
Hope this helps
Indeed, there is no issue on ACL side. I observed the issue because of incorrect axis transformation nchw->nhwc.
Output of 'strings libarm_compute.so | grep arm_compute_version': arm_compute_version=v23.02 Build options: {'neon': '1', 'opencl': '0', 'openmp': '0', 'cppthreads': '1', 'examples': '0', 'Werror': '0', 'gemm_tuner': '0', 'reference_openmp': '0', 'validation_tests': '0', 'benchmark_tests': '0', 'data_layout_support': 'all', 'build_dir': '/thirdparty/ComputeLibrary', 'install_dir': '/thirdparty/ComputeLibrary/install', 'arch': 'armv8.2-a', 'debug': '1', 'asserts': '1', 'logging': '1', 'os': 'macos', 'build': 'native', 'compiler_prefix': '/usr/bin/', 'extra_cxx_flags': '-fPIC -fsigned-char -ffunction-sections -fdata-sections -fdiagnostics-show-option -Wundef -Wreturn-type -Wunused-variable -Wswitch -Wno-macro-redefined -Wno-undef -Wno-missing-declarations -fvisibility-inlines-hidden -Wall -Wno-unknown-pragmas -fvisibility=internal -mcpu=native -Wno-undef -Wno-error=return-stack-address'} Git hash=b'f8f7ede7a01eb5cd9d06060b4d2f2d1404d93f29'
Platform: Apple M1
Operating System: macOS 12.6
Problem description: NEReduceMean passes 100% tests for NCHW layout and about 20% tests for NHWC layout due to accuracy issues. Does NEReduceMean expect the same data layout for NHWC as other operations: TensorShape(C, W, H, N)? Are there any other differences between NCHW and NHWC layouts in regard of NEReduceMean?