ArmNN TfLite delegate segfaults on Android

brtal commented 1 year ago

Using ArmNN 22.11, I'm seeing segfaults on a bunch of networks with both the nightly tflite benchmark tool and our Android app using TfLite 2.11.

I'm using the pre-built Android 29 package from the main GitHub page. (Note that the other Android packages seem to contain the ArmNN-based NNAPI service, rather than the delegate like the Android 29 package.)

I experience the same problems with older ArmNN releases.

I've tried this on multiple Android 12 and 13 phones.

A few notable points:

CpuRef:

$ LD_LIBRARY_PATH=$(pwd) ./android_aarch64_benchmark_model --graph=model_float32.tflite --external_delegate_path=./libarmnnDelegate.so --external_delegate_options="backends:CpuRef"
STARTING!
Log parameter values verbosely: [0]
Graph: [model_float32.tflite]
External delegate path: [./libarmnnDelegate.so]
External delegate options: [backends:CpuRef]
Loaded model model_float32.tflite
INFO: Initialized TensorFlow Lite runtime.
INFO: TfLiteArmnnDelegate: Created TfLite ArmNN delegate.
EXTERNAL delegate created.
VERBOSE: Replacing 5 node(s) with delegate (TfLiteArmNnDelegate) node, yielding 1 partitions for the whole graph.
Explicitly applied EXTERNAL delegate, and the model graph will be completely executed by the delegate.
The input model file size (MB): 10.5242
Initialized session in 276.742ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=4 first=154499 curr=138922 min=138516 max=154499 avg=142629 std=6854

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=138316 curr=138343 min=138104 max=141755 avg=138811 std=679

Inference timings in us: Init: 276742, First inference: 154499, Warmup (avg): 142629, Inference (avg): 138811
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=79.2812 overall=79.2812
Aborted

Note this runs, but aborts at the end.

CpuAcc:

$ LD_LIBRARY_PATH=$(pwd) ./android_aarch64_benchmark_model --graph=model_float32.tflite --external_delegate_path=./libarmnnDelegate.so --external_delegate_options="backends:CpuAcc"
STARTING!
Log parameter values verbosely: [0]
Graph: [model_float32.tflite]
External delegate path: [./libarmnnDelegate.so]
External delegate options: [backends:CpuAcc]
Loaded model model_float32.tflite
INFO: Initialized TensorFlow Lite runtime.
INFO: TfLiteArmnnDelegate: Created TfLite ArmNN delegate.
EXTERNAL delegate created.
terminating
Aborted

This one segfaults in a different way.

ExecuteNetwork with CpuAcc:

$ LD_LIBRARY_PATH=$(pwd) ./ExecuteNetwork -m model_float32.tflite -c CpuAcc                                                              
Warning: No input files provided, input tensors will be filled with 0s.
Info: ArmNN v31.0.0
Info: Initialization time: 26.27 ms.
Info: Shutdown time: 0.29 ms.
Fatal: Invalid dimension index: 2 (number of dimensions is 2) at function CheckDimensionIndex [/devenv/armnn/src/armnn/Tensor.cpp:279]
Aborted

I'm not sure if ExecuteNetwork is expected to abort at the end. I see this even when I run ExecuteNetwork --help. Nevertheless, it at least is able to catch the exception and dump out the error.

And finally, when run from our Android app in native code:

abort 0x0000007da130fba8
<unknown> 0x0000007a2d83b308
<unknown> 0x0000007a2d83b534
std::__terminate(void (*)()) 0x0000007a3ac650fc
__cxxabiv1::call_terminate(bool, _Unwind_Exception *) 0x0000007a3ac79570
__cxxabiv1::scan_eh_tab(__cxxabiv1::scan_results &, _Unwind_Action, bool, _Unwind_Exception *, _Unwind_Context *) 0x0000007a3ac79514
__gxx_personality_v0(int, _Unwind_Action, uint64_t, _Unwind_Exception *, _Unwind_Context *) 0x0000007a3ac78e00
<unknown> 0x0000007a2d853194
__cxa_throw 0x0000007a2d83a8b0
armnnDelegate::VisitUnidirectionalSequenceLstmOperator(armnnDelegate::DelegateData &, TfLiteContext *, TfLiteNode *, int, int) 0x0000007a458ca958
armnnDelegate::Delegate::IdentifyOperatorsToDelegate(TfLiteContext *) 0x0000007a458cbdf4
armnnDelegate::DoPrepare(TfLiteContext *, TfLiteDelegate *) 0x0000007a458cbb78
tflite::Subgraph::ModifyGraphWithDelegate(TfLiteDelegate *) 0x0000007a3a64eb08
tflite::Interpreter::ModifyGraphWithDelegateImpl(TfLiteDelegate *) 0x0000007a3a665cd8
...

The process segfaults by way of VisitUnidirectionalSequenceLstmOperator, presumably calling into CheckDimensionIndex somehow.

model_float32.tflite.zip

I assume that the segfault seen in benchmark_tool and our app is the same root cause. It's not clear if the error that surfaces in ExecuteNetwork is the same, since you're using your own parser/runtime, and not tflite.

Something curious is going on in the upper parts of the stack dump above, which is causing terminate to be called while throwing an exception and unwinding the stack. Is this an ABI issue between tflite 2.11 and whatever you built against?

I started to look at building this all from source so I could debug this and build against tflite 2.11, but I haven't invested the time to make your build tool work with the Android toolchain. Presumably you have this internally. It would be wonderful if you could share it publicly!

jlamperez commented 1 year ago

I have compiled armnn with tensorflow 2.11 and with main branch in

and this is what I get in an embedded platform when I use your model,

LD_LIBRARY_PATH=/home/root/aarch64_build ./linux_aarch64_benchmark_model \
> --graph=/home/root/model_float32.tflite \
> --external_delegate_path="/home/root/aarch64_build/delegate/libarmnnDelegate.so" \
> --external_delegate_options="backends:CpuAcc;logging-severity:info"
STARTING!
Log parameter values verbosely: [0]
Graph: [/home/root/model_float32.tflite]
External delegate path: [/home/root/aarch64_build/delegate/libarmnnDelegate.so]
External delegate options: [backends:CpuAcc;logging-severity:info]
Loaded model /home/root/model_float32.tflite
Info: ArmNN v32.0.0
Info: Initialization time: 0.19 ms.
INFO: TfLiteArmnnDelegate: Created TfLite ArmNN delegate.
EXTERNAL delegate created.
Error: ArmNN Failed to visit node with error: Invalid dimension index: 2 (number of dimensions is 2) at function CheckDimensionIndex [/home/arm-user/source/armnn/src/armnn/Tensor.cpp:279]
Failed to check layer support at function operator() [/home/arm-user/source/armnn/delegate/src/UnidirectionalSequenceLstm.hpp:272]
Error: ArmNN Failed to visit node with error: Invalid dimension index: 2 (number of dimensions is 2) at function CheckDimensionIndex [/home/arm-user/source/armnn/src/armnn/Tensor.cpp:279]
Failed to check layer support at function operator() [/home/arm-user/source/armnn/delegate/src/UnidirectionalSequenceLstm.hpp:272]
Error: ArmNN Failed to visit node with error: Invalid dimension index: 2 (number of dimensions is 2) at function CheckDimensionIndex [/home/arm-user/source/armnn/src/armnn/Tensor.cpp:279]
Failed to check layer support at function operator() [/home/arm-user/source/armnn/delegate/src/UnidirectionalSequenceLstm.hpp:272]
Error: ArmNN Failed to visit node with error: Invalid dimension index: 2 (number of dimensions is 2) at function CheckDimensionIndex [/home/arm-user/source/armnn/src/armnn/Tensor.cpp:279]
Failed to check layer support at function operator() [/home/arm-user/source/armnn/delegate/src/UnidirectionalSequenceLstm.hpp:272]
Error: ArmNN Failed to visit node with error: Invalid dimension index: 2 (number of dimensions is 2) at function CheckDimensionIndex [/home/arm-user/source/armnn/src/armnn/Tensor.cpp:279]
Failed to check layer support at function operator() [/home/arm-user/source/armnn/delegate/src/UnidirectionalSequenceLstm.hpp:272]
ERROR: Operator UNIDIRECTIONAL_SEQUENCE_LSTM [44] is not supported by armnn_delegate.
Info: No operators in this model are supported by the Arm NN TfLite delegate. The model will be executed entirely by TfLite runtime.
Though EXTERNAL delegate is explicitly applied, the model graph will not be executed by the delegate.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
The input model file size (MB): 10.5242
Initialized session in 27.295ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=6 first=93080 curr=90963 min=90917 max=93080 avg=91309.3 std=792

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=91069 curr=90922 min=90804 max=92418 avg=90978.2 std=218

Inference timings in us: Init: 27295, First inference: 93080, Warmup (avg): 91309.3, Inference (avg): 90978.2
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=14.8008 overall=20.7031
Info: Shutdown time: 0.02 ms.

if it can help you... it seems that Operator UNIDIRECTIONAL_SEQUENCE_LSTM [44] is not supported by armnn_delegate.

Regards!

matthewsloyanARM commented 1 year ago

Hi @brtal,

Thank you for your detailed description. So it looks like the Arm NN TfLite delegate is correctly falling back to the XNNPACK delegate as it returned unsupported in your CpuRef example. I am interested to see why it is terminating at the end though in your other examples, even when just running ExecuteNetwork --help. I will try the pre-built packages like you have to see if I can reproduce it. I will also look into the UNIDIRECTIONAL_SEQUENCE_LSTM issue to see if support can be added for your model and get back to you.

We are not sure if there are any issues with TensorFlow 2.11, there could be, but if it works for @jlamperez it might not be the issue. We currently test against 2.10.

Regarding using the build-tool with Android, it is something we have on our backlog but there is no estimated time of completion. However, we would much appreciate any efforts you have made towards it or plan to. See our contributing guide for more information: Contributing guide

@jlamperez thank you for testing this out, this also does show that the Arm NN TfLite delegate is correctly falling back to the XNNPACK delegate as it returned unsupported, due to the Invalid dimension index. As mentioned above though, I will look into this to be sure this is correct and see if support can be added.

Kind regards,

Matthew

jlamperez commented 1 year ago

Hi @matthewsloyanARM ,

If the compilation of tflite is like this:

  cmake -DTFLITE_ENABLE_XNNPACK=OFF \
        "$target_arch_cmd" \
        "$TFLITE_SRC"
  cmake --build . -j "$NUM_THREADS"

Why if Arm NN TfLite fails then it works with XNNPACK delegate if -DTFLITE_ENABLE_XNNPACK=OFF? I am not understanding this part.

matthewsloyanARM commented 1 year ago

Hi @jlamperez,

I have replied to your message on #716, just to keep this issue clean. Hopefully that is okay.

Kind regards,

Matthew

brtal commented 1 year ago

Thanks for your response @matthewsloyanARM. I spent a bunch of time this week digging into this more.

The immediate source of the problem was libc++ issues. While this area is not new to me, the Android toolchain is new to me - and I was not aware that it links libc++ statically by default. The crashes I was seeing are clearly a source of multiple binaries statically linking libc++.

The ArmNN-android-27-arm64-v8a package I downloaded from your repo has libarmnn.so and libarmnnDelegate.so that each link it statically. In the network I attached, I'm pretty certain the exception that is thrown is passed across the library boundary.

% readelf -d libarmnn.so 

Dynamic section at offset 0x1b2e540 contains 29 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [liblog.so]
 0x0000000000000001 (NEEDED)             Shared library: [libz.so]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so]
 0x000000000000000e (SONAME)             Library soname: [libarmnn.so]
[cut]

% readelf -d libarmnnDelegate.so 

Dynamic section at offset 0x5d0e8 contains 30 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [liblog.so]
 0x0000000000000001 (NEEDED)             Shared library: [libz.so]
 0x0000000000000001 (NEEDED)             Shared library: [libarmnn.so]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so]
 0x000000000000000e (SONAME)             Library soname: [libarmnnDelegate.so]
[cut]

I was able to get passed the segfaults I noted above once I got this all building and changed how libc++ was linked.

For our purposes, we will probably continue to use local builds to keep ourselves unblocked. But others might want these libraries with a shared c++ if they are going to integrate them.

I will admit I needed to hack both the scons build (to make the compute framework cross-compile on macOS and disable the shared libraries we don't need) and the ArmNN cmake (to make it link against our local tflite build). Perhaps a separate conversation around integrating those changes back into your world is possible if you folks are interested. I do have a minor bugfix to the delegate that I'd like to send back your way though.

Regarding the LSTM index issue: do you have a sense of whether this is a bug or the network configuration is not supported?

Thanks a lot!

matthewsloyanARM commented 1 year ago

Hi @brtal,

Thank you for investigating this, it's much appreciated and I wasn't aware of that either. We would really appreciate any contribution you can make to help this. If you need any help submitting a patch let me know.

Also, you mentioned you made changes to the Arm Compute Library, if you like feel free to create an issue there too and someone will help you implement them specific changes.

Yes I have been able to reproduce he LSTM issue and I am looking into a fix at the moment. I will get back to you when I have an update.

Kind regards,

Matthew

John-ARM commented 1 year ago

Hello, The LSTM issue should be fixed in our latest release. Please feel free to reopen this ticket or create a new one if you still have any questions. Thank you John

ARM-software / armnn

ArmNN TfLite delegate segfaults on Android #717