ARM-software / armnn

Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn
https://developer.arm.com/products/processors/machine-learning/arm-nn
MIT License
1.14k stars 307 forks source link

UnitTests fails with segmentation fault on Odroid-XU4 (Mali - T628) #696

Closed vn218 closed 1 year ago

vn218 commented 1 year ago

I built ARM-NN for my odroid xu4 (aarch32) by following the ARM NN Build Tool tutorial. On running UnitTests, it fails with segmentation fault.

odroid@odroid:~/Desktop/aarch32_build$ export LD_LIBRARY_PATH=.;./UnitTests 
[doctest] doctest version is "2.4.6"
[doctest] run with "--help" for options
===============================================================================
/home/arm-user/source/armnn/src/armnn/test/optimizations/FuseActivationTests.cpp:878:
TEST SUITE: Optimizer
TEST CASE:  FuseReLUIntoConvFloat32GpuAccTest

/home/arm-user/source/armnn/src/armnn/test/optimizations/FuseActivationTests.cpp:878: FATAL ERROR: test case CRASHED: SIGSEGV - Segmentation violation signal

===============================================================================
[doctest] test cases:    403 |    402 passed | 1 failed | 2284 skipped
[doctest] assertions: 772095 | 772095 passed | 0 failed |
[doctest] Status: FAILURE!
Segmentation fault

I am running Ubuntu Mate 22.04

james-conroy-arm commented 1 year ago

Hi @vn218 ,

Many thanks for raising this issue. Did you build Arm NN using the build-tool with or without Docker? On a remote machine or on the Odroid board itself?

Since there is a segfault, there may be other tests failing that are not executing. To help us try narrow down the issue, could you please provide us with the output for the following UnitTests flags:

Run FuseReLUIntoConvFloat32GpuAccTest exclusively: export LD_LIBRARY_PATH=.;./UnitTests --test-case=FuseReLUIntoConvFloat32GpuAccTest

Run all tests except FuseReLUIntoConvFloat32GpuAccTest: export LD_LIBRARY_PATH=.;./UnitTests --test-case-exclude=FuseReLUIntoConvFloat32GpuAccTest

Run all test suites except Optimizer tests: export LD_LIBRARY_PATH=.;./UnitTests --test-suite-exclude=Optimizer

Run all tests except those on the GpuAcc backend: export LD_LIBRARY_PATH=.;./UnitTests --test-case-exclude=*GpuAcc*

Thanks, James

vn218 commented 1 year ago

I built using the build-tool with docker on a remote virtual ubuntu machine. Here are the outputs for the said tests

odroid@odroid:~/Desktop/aarch32_build$ export LD_LIBRARY_PATH=.;./UnitTests --test-case=FuseRELUIntoConvFloat32GpuAccTest
[doctest] doctest version is "2.4.6"
[doctest] run with "--help" for options
===============================================================================
/home/arm-user/source/armnn/src/armnn/test/optimizations/FuseActivationTests.cpp:878:
TEST SUITE: Optimizer
TEST CASE:  FuseReLUIntoConvFloat32GpuAccTest

/home/arm-user/source/armnn/src/armnn/test/optimizations/FuseActivationTests.cpp:878: FATAL ERROR: test case CRASHED: SIGSEGV - Segmentation violation signal

===============================================================================
[doctest] test cases: 1 | 0 passed | 1 failed | 2686 skipped
[doctest] assertions: 4 | 4 passed | 0 failed |
[doctest] Status: FAILURE!
Segmentation fault

odroid@odroid:~/Desktop/aarch32_build$ export LD_LIBRARY_PATH=.;./UnitTests --test-case-exclude=FuseRELUIntoConvFloat32GpuAccTest
[doctest] doctest version is "2.4.6"
[doctest] run with "--help" for options
===============================================================================
/home/arm-user/source/armnn/src/armnn/test/optimizations/FuseActivationTests.cpp:886:
TEST SUITE: Optimizer
TEST CASE:  FuseReLUIntoDWConvFloat32GpuAccTest

/home/arm-user/source/armnn/src/armnn/test/optimizations/FuseActivationTests.cpp:886: FATAL ERROR: test case CRASHED: SIGSEGV - Segmentation violation signal

===============================================================================
[doctest] test cases:    403 |    402 passed | 1 failed | 2284 skipped
[doctest] assertions: 772095 | 772095 passed | 0 failed |
[doctest] Status: FAILURE!
Segmentation fault

odroid@odroid:~/Desktop/aarch32_build$ export LD_LIBRARY_PATH=.;./UnitTests --test-suite-exclude=Optimizer
[doctest] doctest version is "2.4.6"
[doctest] run with "--help" for options
===============================================================================
/home/arm-user/source/armnn/src/armnn/test/optimizations/ReduceMultipleAxesTests.cpp:285:
TEST SUITE: Optimizer_ReduceMultipleAxesGpu
TEST CASE:  ReduceSumWithTwoAxesGpuAccTest

/home/arm-user/source/armnn/src/armnn/test/optimizations/ReduceMultipleAxesTests.cpp:285: FATAL ERROR: test case CRASHED: SIGSEGV - Segmentation violation signal

===============================================================================
[doctest] test cases:    340 |    339 passed | 1 failed | 2347 skipped
[doctest] assertions: 770802 | 770802 passed | 0 failed |
[doctest] Status: FAILURE!
Segmentation fault

odroid@odroid:~/Desktop/aarch32_build$ export LD_LIBRARY_PATH=.;./UnitTests --test-case-exclude=*GpuAcc*
[doctest] doctest version is "2.4.6"
[doctest] run with "--help" for options
===============================================================================
/home/arm-user/source/armnn/src/backends/aclCommon/test/MemCopyTests.cpp:89:
TEST CASE:  CopyBetweenNeonAndGpu

/home/arm-user/source/armnn/src/backends/aclCommon/test/MemCopyTests.cpp:89: FATAL ERROR: test case CRASHED: SIGSEGV - Segmentation violation signal

===============================================================================
[doctest] test cases:    444 |    443 passed | 1 failed | 2243 skipped
[doctest] assertions: 772326 | 772326 passed | 0 failed |
[doctest] Status: FAILURE!
Segmentation fault
james-conroy-arm commented 1 year ago

Thanks for that @vn218 . Looks like many tests are failing on GPU.

Could you please try this more specific GPU exclusion filter: export LD_LIBRARY_PATH=.;./UnitTests --test-case-exclude=*Gpu*

You could also try adding Mali to your LD_LIBRARY_PATH, but this may not be the exact location on your device (this works on an aarch64 Odroid N2+ board): export LD_LIBRARY_PATH=.:/usr/share/mali; ./UnitTests

James

vn218 commented 1 year ago

The output ....

odroid@odroid:~/Desktop/aarch32_build$ export LD_LIBRARY_PATH=.;./UnitTests --test-case-exclude=*GPU*
[doctest] doctest version is "2.4.6"
[doctest] run with "--help" for options
Warning: Timed out waiting on profiling service activation for 3000.12 ms
Warning: Timed out waiting on profiling service activation for 3000.11 ms
Warning: Timed out waiting on profiling service activation for 3000.11 ms
===============================================================================
/home/arm-user/source/armnn/src/backends/cl/test/ClCreateWorkloadTests.cpp:226:
TEST SUITE: CreateWorkloadCl
TEST CASE:  CreateBatchNormalizationFloatNchwWorkload

/home/arm-user/source/armnn/src/backends/cl/test/ClCreateWorkloadTests.cpp:226: FATAL ERROR: test case CRASHED: SIGSEGV - Segmentation violation signal

===============================================================================
[doctest] test cases:    537 |    536 passed | 1 failed | 2150 skipped
[doctest] assertions: 773224 | 773224 passed | 0 failed |
[doctest] Status: FAILURE!
Segmentation fault

Adding mali to LD_LIBRARY_PATH didn't help

james-conroy-arm commented 1 year ago

Please try: export LD_LIBRARY_PATH=.;./UnitTests --test-case-exclude=*GPU*,*CL*

What is the contents of /usr/share/mali/ ?

vn218 commented 1 year ago

adding --test-case-exclude=*Cl* gives the same output as before, however adding --test-suite-exclude=*Cl* passes

odroid@odroid:~/Desktop/aarch32_build$ export LD_LIBRARY_PATH=.;./UnitTests --test-case-exclude=*GPU*,*Cl* --test-suite-exclude=*Cl*
[doctest] doctest version is "2.4.6"
[doctest] run with "--help" for options
Warning: Timed out waiting on profiling service activation for 3000.11 ms
Warning: Timed out waiting on profiling service activation for 3000.11 ms
Warning: Timed out waiting on profiling service activation for 3000.11 ms
Warning: Timed out waiting on profiling service activation for 3000.11 ms
Warning: Timed out waiting on profiling service activation for 3000.11 ms
Warning: Timed out waiting on profiling service activation for 3000.11 ms
===============================================================================
[doctest] test cases:   1592 |   1592 passed | 0 failed | 1095 skipped
[doctest] assertions: 780221 | 780221 passed | 0 failed |
[doctest] Status: SUCCESS!

contents of mali .....

odroid@odroid:~/Desktop/aarch32_build$ ls -R /usr/share/mali/
/usr/share/mali/:
headers  pkgconfig

/usr/share/mali/headers:
CL  CL_1_2  CL_2_0  EGL  GLES  GLES2  GLES3  KHR

/usr/share/mali/headers/CL:
cl.h        cl_egl.h  cl_gl.h      cl_platform.h
cl_d3d10.h  cl_ext.h  cl_gl_ext.h  opencl.h

/usr/share/mali/headers/CL_1_2:
CL

/usr/share/mali/headers/CL_1_2/CL:
cl.h        cl_d3d11.h              cl_egl.h  cl_gl.h      cl_platform.h
cl_d3d10.h  cl_dx9_media_sharing.h  cl_ext.h  cl_gl_ext.h  opencl.h

/usr/share/mali/headers/CL_2_0:
CL

/usr/share/mali/headers/CL_2_0/CL:
cl.h    cl_d3d10.h  cl_dx9_media_sharing.h  cl_ext.h  cl_gl_ext.h    opencl.h
cl.hpp  cl_d3d11.h  cl_egl.h                cl_gl.h   cl_platform.h

/usr/share/mali/headers/EGL:
egl.h  eglext.h  eglplatform.h

/usr/share/mali/headers/GLES:
egl.h  gl.h  glext.h  glplatform.h

/usr/share/mali/headers/GLES2:
gl2.h  gl2ext.h  gl2platform.h

/usr/share/mali/headers/GLES3:
gl3.h  gl31.h  gl32.h  gl3platform.h

/usr/share/mali/headers/KHR:
khrplatform.h

/usr/share/mali/pkgconfig:
egl.pc  glesv1.pc  glesv1_cm.pc  glesv2.pc  glesv3.pc  opencl.pc
vn218 commented 1 year ago

Also, when I try to run a tflite model (yolov5s) with the delegate using tflite-runtime 2.5, it fails with segfault for all combinations of backends.

vn218 commented 1 year ago

@james-conroy-arm Hi! Is there any update?

james-conroy-arm commented 1 year ago

Hi @vn218 ,

The Mali driver on your device may not be setup correctly, it's difficult for me to say.

Albeit on a different Odroid board (N2+) and aarch64, this is the contents of /usr/share/mali:

ls -l /usr/share/mali
total 36804
lrwxrwxrwx 1 root root       11 Jul  7  2020 libEGL.so -> libEGL.so.1
lrwxrwxrwx 1 root root       13 Jul  7  2020 libEGL.so.1 -> libEGL.so.1.4
lrwxrwxrwx 1 root root       10 Jul  7  2020 libEGL.so.1.4 -> libMali.so
lrwxrwxrwx 1 root root       17 Jul  7  2020 libGLESv1_CM.so -> libGLESv1_CM.so.1
lrwxrwxrwx 1 root root       19 Jul  7  2020 libGLESv1_CM.so.1 -> libGLESv1_CM.so.1.1
lrwxrwxrwx 1 root root       10 Jul  7  2020 libGLESv1_CM.so.1.1 -> libMali.so
lrwxrwxrwx 1 root root       14 Jul  7  2020 libGLESv2.so -> libGLESv2.so.2
lrwxrwxrwx 1 root root       16 Jul  7  2020 libGLESv2.so.2 -> libGLESv2.so.2.0
lrwxrwxrwx 1 root root       10 Jul  7  2020 libGLESv2.so.2.0 -> libMali.so
-rw-r--r-- 1 root root 37684056 Jul  7  2020 libMali.so
lrwxrwxrwx 1 root root       14 Jul  7  2020 libOpenCL.so -> libOpenCL.so.1
lrwxrwxrwx 1 root root       10 Jul  7  2020 libOpenCL.so.1 -> libmali.so
lrwxrwxrwx 1 root root       10 Jul  7  2020 libmali.so -> libMali.so

Could you check the contents of /usr/lib/aarch64-linux-gnu/ to see if a libMali.so can be found?

If so, before running unit tests you could also try running: ln -s /usr/lib/aarch64-linux-gnu/libMali.so /lib/aarch64-linux-gnu/libOpenCL.so

Would you mind raising a separate issue for the yolov5s seg fault with steps to reproduce please, and we will look into it for you.

Thanks, James

vn218 commented 1 year ago

@james-conroy-arm Thanks! Linking libMali.so and libOpenCL.so worked (libMali.so was at a different location). But, now I am getting a new set of errors ......... UnitTests.log

However, all the DelegateUnitTests are passing, so I guess that is enough for my use case.

yolov5s is still not working, I've raised a separate issue for it.

FrancisMurtagh-arm commented 1 year ago

Closing as original issue resolved, and new issue raised.