WesleyCh3n commented 2 years ago

Hello,

I'm testing out object detection tflite model with PyArmNN, but runtime.EnqueueWorkload throw an error below:

Traceback (most recent call last):
  File "/home/ubuntu/tflite/armnn_tflite.py", line 48, in <module>
    runtime.EnqueueWorkload(0, input_tensors, output_tensors)
  File "/home/ubuntu/venv/lib/python3.9/site-packages/pyarmnn/_generated/pyarmnn.py", line 4122, in EnqueueWorkload
    return _pyarmnn.IRuntime_EnqueueWorkload(self, networkId, inputTensors, outputTensors)
RuntimeError: MemCopyQueueDescriptor: input & output must have the same number of elements.

I am wodering what are the same number of elements mean?

The followings are the information I used:

OS: ubuntu 5.11.0-1017-raspi on RPI4
PyArmNN Version: 26.0.0
SSD MobileNet v2 from TensorFlow 2 Detection Model Zoo with one object custom data transfer learning
- 1 input (shape: 1,300,300,3) & 4 outputs
- Convert frozen graph to int8 quant tflite

The code from scratch follows PyArmNN API overview:


import pyarmnn as ann
import numpy as np
import cv2

parser = ann.ITfLiteParser() network = parser.CreateNetworkFromBinaryFile('detect.tflite')

Get the input binding information by using the name of the input layer

graph_id = 0 input_names = parser.GetSubgraphInputTensorNames(graph_id) input_binding_info = parser.GetNetworkInputBindingInfo(graph_id, input_names[0])

options = ann.CreationOptions() runtime = ann.IRuntime(options) preferredBackends = [ann.BackendId('CpuAcc'), ann.BackendId('CpuRef')] opt_network, messages = ann.Optimize(network, preferredBackends, runtime.GetDeviceSpec(), ann.OptimizerOptions()) netid, = runtime.LoadNetwork(opt_network)

Get output binding information for an output layer by using the layer name.

output_names = parser.GetSubgraphOutputTensorNames(graph_id) output_list = [] output_list.append(parser.GetNetworkOutputBindingInfo(graph_id, output_names[0])) output_list.append(parser.GetNetworkOutputBindingInfo(graph_id, output_names[1])) output_list.append(parser.GetNetworkOutputBindingInfo(graph_id, output_names[2])) output_list.append(parser.GetNetworkOutputBindingInfo(graph_id, output_names[3])) output_tensors = ann.make_output_tensors(output_list)

Load an image and create an inputTensor for inference.

image = cv2.imread('./test.jpg') image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image_resized = cv2.resize(image_rgb, (300, 300)) input_data = np.expand_dims(np.asarray(image_resized, dtype=np.uint8), axis=0)

input_tensors = ann.make_input_tensors([input_binding_info], [input_data]) print("tensor: ", input_tensors) print("tensor: ", output_tensors)

runtime.EnqueueWorkload(0, input_tensors, output_tensors)

results = ann.workload_tensors_to_ndarray(output_tensors) print(results)


Thanks!

Best Regards,
Wesley

Colm-in-Arm commented 2 years ago

Hi Wesley,

This specific error is coming from ValidateTensorNumElementsMatch in here. In your specific case a MemCopy operator has been inserted and the input and output tensor sizes aren't matching for some reason.

Am I right in saying you're running this using the CpuAcc backend? It's possible this is a symptom of an underlying incompatibility in the model. Can you attempt to run it using the CpuRef backend please just to ensure all the operators are supported.

Colm.

WesleyCh3n commented 2 years ago

Hello @Colm-in-Arm

Am I right in saying you're running this using the CpuAcc backend? It's possible this is a symptom of an underlying incompatibility in the model. Can you attempt to run it using the CpuRef backend please just to ensure all the operators are supported.

Thanks for your reply. I tested CpuRef backend, but with no luck. Same error happened...

Then I tested ssd_mobilenet_v1 with same code in python/pyarmnn/examples/common/tests/conftest.py's link and it work perfectly. So I compared two models (mine & ssd_mobilenet_v1) and found that besides the main structure are different, there are two additional layers called Quantize and Dequantize in my model after input and before outputs. Is there any possible these cause the issue?

Thanks! Wesley

james-conroy-arm commented 2 years ago

Hi @WesleyCh3n ,

Could you please point us to the exact source of the model you are using, so that we can try to reproduce the issue and look into it for you? I can't find a 300x300 model using the link you provided.

Thanks, James

steven9046 commented 2 years ago

Have you solved this problem? I meet this too. And I'm using armnn c++ API v21.05.

terminate called after throwing an instance of 'armnn::InvalidArgumentException'
  what():  MemCopyQueueDescriptor: input & output must have the same number of elements.
Aborted

I print the net and opt_net, and it shows:

Info: Concatenation:0:93:Concat:GpuAcc has 6 input slots and 1 output slots.
Info: The input slot has shape [ 1,1083,2, ]
Info: The input slot has shape [ 1,600,2, ]
Info: The input slot has shape [ 1,150,2, ]
Info: The input slot has shape [ 1,54,2, ]
Info: The input slot has shape [ 1,24,2, ]
Info: The input slot has shape [ 1,6,2, ]
Info: The output slot has shape [ 1,1917,2, ]
Info: 

Info: Reshape:0:90:Reshape:GpuAcc has 1 input slots and 1 output slots.
Info: The input slot has shape [ 1,1917,1,4, ]
Info: The output slot has shape [ 1,1917,4, ]
Info: 

Info: [ Reshape:0:90 (0) -> DetectionPostProcess:0:95 (0) ]:MemCopy:CpuRef has 1 input slots and 1 output slots.
Info: The input slot has shape [ 1,1917,4, ]
Info: The output slot has shape [ 1,1917,4, ]
Info: 

Info: Activation:SIGMOID:0:94:Activation:GpuAcc has 1 input slots and 1 output slots.
Info: The input slot has shape [ 1,1917,2, ]
Info: The output slot has shape [ 1,1917,2, ]
Info: 

Info: [ Activation:SIGMOID:0:94 (0) -> DetectionPostProcess:0:95 (1) ]:MemCopy:CpuRef has 1 input slots and 1 output slots.
Info: The input slot has shape [ 1,1917,2, ]
Info: The output slot has shape [ 1,1917,2, ]
Info: 

Info: DetectionPostProcess:0:95:DetectionPostProcess:CpuRef has 2 input slots and 4 output slots.
Info: The input slot has shape [ 1,1917,4, ]
Info: The input slot has shape [ 1,1917,2, ]
Info: The output slot has shape [ 1,10,4, ]
Info: The output slot has shape [ 1,10, ]
Info: The output slot has shape [ 1,10, ]
Info: The output slot has shape [ 1, ]
Info: 

Info: StatefulPartitionedCall:1:Output:GpuAcc has 1 input slots and 0 output slots.
Info: The input slot has shape [ 1,10, ]
Info: 

Info: StatefulPartitionedCall:3:Output:GpuAcc has 1 input slots and 0 output slots.
Info: The input slot has shape [ 1,10,4, ]
Info: 

Info: StatefulPartitionedCall:0:Output:GpuAcc has 1 input slots and 0 output slots.
Info: The input slot has shape [ 1, ]
Info: 

Info: StatefulPartitionedCall:2:Output:GpuAcc has 1 input slots and 0 output slots.
Info: The input slot has shape [ 1,10, ]

The shape of input and output of memcopy operation is the same. Why it tells me the number of elements is not same?

Info: [ Activation:SIGMOID:0:94 (0) -> DetectionPostProcess:0:95 (1) ]:MemCopy:CpuRef has 1 input slots and 1 output slots.
Info: The input slot has shape [ 1,1917,2, ]
Info: The output slot has shape [ 1,1917,2, ]

steven9046 commented 2 years ago

I found that this error may happen here: https://github.com/ARM-software/armnn/blob/5e9965cae1cc6162649910f423ebd86001fc1931/src/armnn/LoadedNetwork.cpp#L835 And the output layers of my model are: StatefulPartitionedCall:3 StatefulPartitionedCall:2 StatefulPartitionedCall:1 StatefulPartitionedCall:0 Is this 'TFlite Post Process' not supported? But this SSD model don't come up with this error: It also has this 'TFLite Detection PostProcess' operation.

steven9046 commented 2 years ago

I trained the SSD network with the newest tensorflow detection API. The custom operator TFLite_Detection_PostProcess has changed. The order is no longer "0: detection_boxes 1: detection_classes 2: detection_scores 3:num_detections". Maybe this is the reason.

MikeJKelly commented 2 years ago

HI @steven9046

TFLite Detection PostProcess is supported by armnn. How are you running this model? If you're using ExecuteNetwork then the issue may be the output order does not match up with the model's outputs. Can you put your ExecuteNetwork parameters here?

Best regards, Mike

steven9046 commented 2 years ago

Thanks for reply @MikeJKelly I tried ExecuteNetwork and the params were:

./ExecuteNetwork -c CpuRef -f tflite-binary -m /home/ss/ssd_debug/build/resource/model/model_1000.tflite -i serving_default_input:0 -o StatefulPartitionedCall:0,StatefulPartitionedCall:1,StatefulPartitionedCall:2,StatefulPartitionedCall:3

The output name of the 2nd model I mentioned above are:

TFLite_Detection_PostProcess     (the LOCATIONS of the detected boxes)
TFLite_Detection_PostProcess:1   (the scores of the detected boxes)
TFLite_Detection_PostProcess:2   (the categories of the detected boxes)
TFLite_Detection_PostProcess:3   (the number of the detected boxes)

The output of the 1st model which I trained with the newest tensorflow detection API

StatefulPartitionedCall:0  (the number of the detected boxes)
StatefulPartitionedCall:1  (the scores of the detected boxes)
StatefulPartitionedCall:2  (the categories of the detected boxes)
StatefulPartitionedCall:3  (the LOCATIONS of the detected boxes)

The orders are different. I print the output and input elements:

Fatal: Armnn Error: MemCopyQueueDescriptor: input->10 & output->40 must have the same number of elements.

steven9046 commented 2 years ago

I have found out where is wrong. When armnn parse the TFLite_Detection_PostProcess operator it will get the output of the operator. https://github.com/ARM-software/armnn/blob/5e9965cae1cc6162649910f423ebd86001fc1931/src/armnnTfLiteParser/TfLiteParser.cpp#L2856 Then it will overide the shape of the outputs. https://github.com/ARM-software/armnn/blob/5e9965cae1cc6162649910f423ebd86001fc1931/src/armnnTfLiteParser/TfLiteParser.cpp#L2909 But when armnn create a runtime it will get the output info from the subgraph. https://github.com/ARM-software/armnn/blob/5e9965cae1cc6162649910f423ebd86001fc1931/src/armnnTfLiteParser/TfLiteParser.cpp#L4230 https://github.com/ARM-software/armnn/blob/5e9965cae1cc6162649910f423ebd86001fc1931/src/armnnTfLiteParser/TfLiteParser.cpp#L3935 Unfortunately in my model the outputs order of the operator is different from the outputs order of the subgraph. Maybe tensorflow detection API has changed? So this order is wrong for my model. https://github.com/ARM-software/armnn/blob/5e9965cae1cc6162649910f423ebd86001fc1931/src/armnnTfLiteParser/TfLiteParser.cpp#L4242 I just added another vector to save right the shape infos of my model and solved my problem.

TeresaARM commented 2 years ago

Hi Steven,

Please have a look to this commit: https://github.com/ARM-software/armnn/commit/005642326a59dc934353695e92fba7cc476db491

It is currently in master and it will be part of 22.02 release. You may want to cherry pick this commit and try if that fixes the problem you are seeing.

Regards

MikeJKelly commented 2 years ago

Hi @steven9046

Have you had a chance to try the commit @TeresaARM linked? We believe this will fix the issue you're seeing.

Best regards, Mike

steven9046 commented 2 years ago

@MikeJKelly I have found what cause this problem and solved it using this: #602 :v:

MikeJKelly commented 2 years ago

Hi @steven9046

that commit seems to be doing the same thing as the commit @TeresaARM linked, we're trying to confirm if Teresa's code (which is already checked into our master branch) solves your problem too.

TeresaARM commented 2 years ago

Hi @steven9046

the output bindings were fixed in 22.02, could you confirm 22.02 is working for you?

Kind Regards

catcor01 commented 2 years ago

Hi @steven9046,

I am going to close this issue as your issue seems to have been resolved and is some time since the latest activity has occurred. If you are still experiencing problems, please do not hesitate to reopen this ticket or create a new issue.

Kind Regards, Cathal.

ARM-software / armnn

PyArmNN: Error while EnqueueWorkload with tflite #585

Get the input binding information by using the name of the input layer

Get output binding information for an output layer by using the layer name.

Load an image and create an inputTensor for inference.