ARM-software / armnn

Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn
https://developer.arm.com/products/processors/machine-learning/arm-nn
MIT License
1.15k stars 307 forks source link

Segment fault issue (double free) when exit program using armnn delegate with opencl option (-DARMCOMPUTECL) #612

Closed jylee256 closed 2 years ago

jylee256 commented 2 years ago

Hi, I found this fault issue after armnn & ACL was built with the option ARMCOMPUTECL and opencl ON. (It didn't happen until I turn on the opencl options.) When I run the unit test, it works well, and then exits with the fault error, "double free or corruption (!prev)"

/usr/bin/lib32-armnn-21.05 # ./UnitTests Running 4300 test cases... .... *** 33 failures are detected in the test module "UnitTests" double free or corruption (!prev) Aborted [2]+ Segmentation fault

It is same when I run the tensorflow lite benchmark tool test with CpuAcc backends or GpuAcc backends, like that, (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark)

$ benchmark_model --graph=mobilenet_v2_1.0_224_quantized_1_default_1.tflite --external_delegate_path="/usr/lib/libarmnnDelegate.so.24" --external_delegate_options="backends:CpuAcc" ... External delegate path: [/usr/lib/libarmnnDelegate.so.24] External delegate options: [backends:CpuAcc] Loaded model mobilenet_v2_1.0_224_quantized_1_default_1.tflite ... Inference timings in us: Init: 68920252, First inference: 554055, Warmup (avg): 554055, Inference (avg): 422225 Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion. Peak memory footprint (MB): init=43.0742 overall=50.2422 ~ExternalDelegateWrapper ~ExternalDelegateWrapper end [Thread 0xf4d8b420 (LWP 9855) exited] [Thread 0xf558c420 (LWP 9854) exited] ... Thread 1 "benchmark_model" received signal SIGSEGV, Segmentation fault. __GI___libc_free (mem=0x1253ed0) at malloc.c:3104

when I debug it with gdb, it looks like fail on program exit time after all inference is done well, and it is also the same 'double free or corruption error' as the unittest error.

(gdb) bt

0 __libc_do_syscall () at libc-do-syscall.S:48

1 0xf7a4212c in __GI___libc_read (nbytes=4, buf=0xf75ebb70, fd=3) at ../sysdeps/unix/sysv/linux/read.c:26

2 __GI___libc_read (fd=3, buf=0xf75ebb70, nbytes=4) at ../sysdeps/unix/sysv/linux/read.c:24

3 0xf7fb2eba in ?? () from /lib/libSegFault.so

4

5 __GI___libc_free (mem=0x1253ed0) at malloc.c:3104

6 0xf69e1196 in std::_Rb_tree<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::_Select1st<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, std::cxx11::basic_string<char, std::char_traits, std::allocator > > >, std::less<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >::_M_erase(std::_Rb_tree_node<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >*) () from /usr/lib/libarmnn.so.25

7 0xf69e11be in std::map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::less<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >::~map() () from /usr/lib/libarmnn.so.25

8 0xf79dcc08 in __run_exit_handlers (status=0, listp=0xf7aa732c <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108

9 0xf79dccca in __GI_exit (status=) at exit.c:139

10 0xf79ccbe8 in libc_start_main (main=0x412028
, argc=4, argv=0xfffef504, init=, fini=0x452401 <libc_csu_fini>, rtld_fini=0xf7fd0ec5 <_dl_fini>, stack_end=0xfffef504) at libc-start.c:342

11 0x00412358 in _start ()

Backtrace stopped: previous frame identical to this frame (corrupt stack?)

I use the armnn v21.05, and the build flag options when I add are just like that,

-DARMCOMPUTECL=1 -DFLATC=${STAGING_BINDIR_NATIVE}/flatc -DOPENCL_INCLUDE=${STAGING_INCDIR}

The inference doesn't have any problem, it is well done, but the problem happens when the program exits. and I found the similar issue in this sites, so i think it is not my own fault.... https://github.com/ARM-software/armnn/issues/55

Could you help this issue, please?

morgolock commented 2 years ago

Hi @jylee256

Could you please share the details of the toolchain used to build ArmNN & ACL?

Have you experienced the same issue on different devices?

Have you experienced the same problem with v21.11?

jylee256 commented 2 years ago

@morgolock We use the build toolchain 'arm-starfishmllib32-linux-gnueabi-gcc (GCC) 9.3.0' which is maybe based on arm-linux-gnueabi-gcc. I couldn't test it on different devices, but it didn't happen when I turned off the ARMCOMPUTECL option on the same device. It also happened when I built with the ARMCOMPUTECL option on, even if I didn't register the cl backend on the backend registry while modifying the ClRegistryInitialize.cpp file. (src/backends/cl/ClRegistryInitialize.cpp)

jylee256 commented 2 years ago

Hi @morgolock When I upgrade it to v21.11, this problem is not shown. So I'll close it now. Thanks.

Zibri commented 1 year ago

@jylee256 where can I find "arm-starfishmllib32-linux-gnueabi-gcc" ?