ARM-software / armnn

Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn
https://developer.arm.com/products/processors/machine-learning/arm-nn
MIT License
1.14k stars 307 forks source link

GpuAcc: only the last created TFLite interpreter/delegate works fine #703

Closed kalesony closed 1 year ago

kalesony commented 1 year ago

We're trying to integrate ARM NN into our Android C++ project. We want to start with the TFLite delegate path.

The key issue here is that if you create multiple TFLite interpreters with ARM NN GpuAcc delegate. store them first and try to execute them later, only the last created interpreter works fine. We were able to reproduce that even when creating two interpreters from the same model file. CpuAcc delegates work fine.

Here is a diff with modifications that need to be applied to armnn/tests/ExecuteNetwork/ExecuteNetwork.cpp (base armnn revision: d625f5e). This tiny change makes ExecuteNetwork create and run two Executors instead of one.

We were able to achieve the same result using an SSD detector from Mediapipe: https://storage.googleapis.com/mediapipe-assets/ssdlite_object_detection.tflite.

The built modified ExecuteNetwork exec is deployed along with its standard dependencies and the TFLite model file, run on an Android device using ./ExecuteNetwork -m ssdlite_object_detection.tflite -c GpuAcc -T delegate and yields ERROR: Node number <number> (TfLiteArmNnDelegate) failed to invoke. model invocation error for the first Executor.

MikeJKelly commented 1 year ago

Hi @kalesony

thank you for reporting this issue, I'm trying to recreate it now and hope to have some information soon.

Best regards, Mike

MikeJKelly commented 1 year ago

Hi @kalesony

I was able to recreate the issue, the root of the problem is that the two Executors use two different armnn_delegates which in turn use two different IRuntime's and somehow that's causing the earlier Executors to clear something in OpenCL Context so that the CL Workloads return the following (the Model's execution does not bomb out when using the delegate but it does abort with an error when using our TFLite parser):

Error: An error occurred attempting to execute a workload: CL error: clEnqueueNDRangeKernel. Error code: -34 at function Execute [/home/mickel01/devr/devenv/armnn/src/backends/cl/workloads/ClConvolution2dWorkload.cpp:158 

Error code 34 is CL_INVALID_CONTEXT. I'm currently looking at the lifecycle of the IRuntime's being used and trying to figure out the best way of not having this happen.

Best regards, Mike

MikeJKelly commented 1 year ago

Hi @kalesony

I've created a patch that should resolve this issue for you: https://review.mlplatform.org/c/ml/armnn/+/8673

It has been merged to main and it applies cleanly to the revision you're using, can you try it out and let us know if it works? Thanks!

Best regards, Mike

kalesony commented 1 year ago

Hey @MikeJKelly

Yes, it works now, thanks a lot :).