Closed WesleyCh3n closed 2 years ago
Hi Wesley,
This specific error is coming from ValidateTensorNumElementsMatch in here. In your specific case a MemCopy operator has been inserted and the input and output tensor sizes aren't matching for some reason.
Am I right in saying you're running this using the CpuAcc backend? It's possible this is a symptom of an underlying incompatibility in the model. Can you attempt to run it using the CpuRef backend please just to ensure all the operators are supported.
Colm.
Hello @Colm-in-Arm
Am I right in saying you're running this using the CpuAcc backend? It's possible this is a symptom of an underlying incompatibility in the model. Can you attempt to run it using the CpuRef backend please just to ensure all the operators are supported.
Thanks for your reply. I tested CpuRef backend, but with no luck. Same error happened...
Then I tested ssd_mobilenet_v1 with same code in python/pyarmnn/examples/common/tests/conftest.py
's link and it work perfectly. So I compared two models (mine & ssd_mobilenet_v1) and found that besides the main structure are different, there are two additional layers called Quantize
and Dequantize
in my model after input and before outputs. Is there any possible these cause the issue?
Thanks! Wesley
Hi @WesleyCh3n ,
Could you please point us to the exact source of the model you are using, so that we can try to reproduce the issue and look into it for you? I can't find a 300x300 model using the link you provided.
Thanks, James
Have you solved this problem? I meet this too. And I'm using armnn c++ API v21.05.
terminate called after throwing an instance of 'armnn::InvalidArgumentException'
what(): MemCopyQueueDescriptor: input & output must have the same number of elements.
Aborted
I print the net and opt_net, and it shows:
Info: Concatenation:0:93:Concat:GpuAcc has 6 input slots and 1 output slots.
Info: The input slot has shape [ 1,1083,2, ]
Info: The input slot has shape [ 1,600,2, ]
Info: The input slot has shape [ 1,150,2, ]
Info: The input slot has shape [ 1,54,2, ]
Info: The input slot has shape [ 1,24,2, ]
Info: The input slot has shape [ 1,6,2, ]
Info: The output slot has shape [ 1,1917,2, ]
Info:
Info: Reshape:0:90:Reshape:GpuAcc has 1 input slots and 1 output slots.
Info: The input slot has shape [ 1,1917,1,4, ]
Info: The output slot has shape [ 1,1917,4, ]
Info:
Info: [ Reshape:0:90 (0) -> DetectionPostProcess:0:95 (0) ]:MemCopy:CpuRef has 1 input slots and 1 output slots.
Info: The input slot has shape [ 1,1917,4, ]
Info: The output slot has shape [ 1,1917,4, ]
Info:
Info: Activation:SIGMOID:0:94:Activation:GpuAcc has 1 input slots and 1 output slots.
Info: The input slot has shape [ 1,1917,2, ]
Info: The output slot has shape [ 1,1917,2, ]
Info:
Info: [ Activation:SIGMOID:0:94 (0) -> DetectionPostProcess:0:95 (1) ]:MemCopy:CpuRef has 1 input slots and 1 output slots.
Info: The input slot has shape [ 1,1917,2, ]
Info: The output slot has shape [ 1,1917,2, ]
Info:
Info: DetectionPostProcess:0:95:DetectionPostProcess:CpuRef has 2 input slots and 4 output slots.
Info: The input slot has shape [ 1,1917,4, ]
Info: The input slot has shape [ 1,1917,2, ]
Info: The output slot has shape [ 1,10,4, ]
Info: The output slot has shape [ 1,10, ]
Info: The output slot has shape [ 1,10, ]
Info: The output slot has shape [ 1, ]
Info:
Info: StatefulPartitionedCall:1:Output:GpuAcc has 1 input slots and 0 output slots.
Info: The input slot has shape [ 1,10, ]
Info:
Info: StatefulPartitionedCall:3:Output:GpuAcc has 1 input slots and 0 output slots.
Info: The input slot has shape [ 1,10,4, ]
Info:
Info: StatefulPartitionedCall:0:Output:GpuAcc has 1 input slots and 0 output slots.
Info: The input slot has shape [ 1, ]
Info:
Info: StatefulPartitionedCall:2:Output:GpuAcc has 1 input slots and 0 output slots.
Info: The input slot has shape [ 1,10, ]
The shape of input and output of memcopy operation is the same. Why it tells me the number of elements is not same?
Info: [ Activation:SIGMOID:0:94 (0) -> DetectionPostProcess:0:95 (1) ]:MemCopy:CpuRef has 1 input slots and 1 output slots.
Info: The input slot has shape [ 1,1917,2, ]
Info: The output slot has shape [ 1,1917,2, ]
I found that this error may happen here: https://github.com/ARM-software/armnn/blob/5e9965cae1cc6162649910f423ebd86001fc1931/src/armnn/LoadedNetwork.cpp#L835 And the output layers of my model are: StatefulPartitionedCall:3 StatefulPartitionedCall:2 StatefulPartitionedCall:1 StatefulPartitionedCall:0 Is this 'TFlite Post Process' not supported? But this SSD model don't come up with this error: It also has this 'TFLite Detection PostProcess' operation.
I trained the SSD network with the newest tensorflow detection API. The custom operator TFLite_Detection_PostProcess has changed. The order is no longer "0: detection_boxes 1: detection_classes 2: detection_scores 3:num_detections". Maybe this is the reason.
HI @steven9046
TFLite Detection PostProcess is supported by armnn. How are you running this model? If you're using ExecuteNetwork then the issue may be the output order does not match up with the model's outputs. Can you put your ExecuteNetwork parameters here?
Best regards, Mike
Thanks for reply @MikeJKelly I tried ExecuteNetwork and the params were:
./ExecuteNetwork -c CpuRef -f tflite-binary -m /home/ss/ssd_debug/build/resource/model/model_1000.tflite -i serving_default_input:0 -o StatefulPartitionedCall:0,StatefulPartitionedCall:1,StatefulPartitionedCall:2,StatefulPartitionedCall:3
The output name of the 2nd model I mentioned above are:
TFLite_Detection_PostProcess (the LOCATIONS of the detected boxes)
TFLite_Detection_PostProcess:1 (the scores of the detected boxes)
TFLite_Detection_PostProcess:2 (the categories of the detected boxes)
TFLite_Detection_PostProcess:3 (the number of the detected boxes)
The output of the 1st model which I trained with the newest tensorflow detection API
StatefulPartitionedCall:0 (the number of the detected boxes)
StatefulPartitionedCall:1 (the scores of the detected boxes)
StatefulPartitionedCall:2 (the categories of the detected boxes)
StatefulPartitionedCall:3 (the LOCATIONS of the detected boxes)
The orders are different. I print the output and input elements:
Fatal: Armnn Error: MemCopyQueueDescriptor: input->10 & output->40 must have the same number of elements.
I have found out where is wrong. When armnn parse the TFLite_Detection_PostProcess operator it will get the output of the operator. https://github.com/ARM-software/armnn/blob/5e9965cae1cc6162649910f423ebd86001fc1931/src/armnnTfLiteParser/TfLiteParser.cpp#L2856 Then it will overide the shape of the outputs. https://github.com/ARM-software/armnn/blob/5e9965cae1cc6162649910f423ebd86001fc1931/src/armnnTfLiteParser/TfLiteParser.cpp#L2909 But when armnn create a runtime it will get the output info from the subgraph. https://github.com/ARM-software/armnn/blob/5e9965cae1cc6162649910f423ebd86001fc1931/src/armnnTfLiteParser/TfLiteParser.cpp#L4230 https://github.com/ARM-software/armnn/blob/5e9965cae1cc6162649910f423ebd86001fc1931/src/armnnTfLiteParser/TfLiteParser.cpp#L3935 Unfortunately in my model the outputs order of the operator is different from the outputs order of the subgraph. Maybe tensorflow detection API has changed? So this order is wrong for my model. https://github.com/ARM-software/armnn/blob/5e9965cae1cc6162649910f423ebd86001fc1931/src/armnnTfLiteParser/TfLiteParser.cpp#L4242 I just added another vector to save right the shape infos of my model and solved my problem.
Hi Steven,
Please have a look to this commit: https://github.com/ARM-software/armnn/commit/005642326a59dc934353695e92fba7cc476db491
It is currently in master and it will be part of 22.02 release. You may want to cherry pick this commit and try if that fixes the problem you are seeing.
Regards
Hi @steven9046
Have you had a chance to try the commit @TeresaARM linked? We believe this will fix the issue you're seeing.
Best regards, Mike
@MikeJKelly I have found what cause this problem and solved it using this: #602 :v:
Hi @steven9046
that commit seems to be doing the same thing as the commit @TeresaARM linked, we're trying to confirm if Teresa's code (which is already checked into our master branch) solves your problem too.
Hi @steven9046
the output bindings were fixed in 22.02, could you confirm 22.02 is working for you?
Kind Regards
Hi @steven9046,
I am going to close this issue as your issue seems to have been resolved and is some time since the latest activity has occurred. If you are still experiencing problems, please do not hesitate to reopen this ticket or create a new issue.
Kind Regards, Cathal.
Hello,
I'm testing out object detection tflite model with PyArmNN, but
runtime.EnqueueWorkload
throw an error below:I am wodering what are the
same number of elements
mean?The followings are the information I used:
parser = ann.ITfLiteParser() network = parser.CreateNetworkFromBinaryFile('detect.tflite')
Get the input binding information by using the name of the input layer
graph_id = 0 input_names = parser.GetSubgraphInputTensorNames(graph_id) input_binding_info = parser.GetNetworkInputBindingInfo(graph_id, input_names[0])
options = ann.CreationOptions() runtime = ann.IRuntime(options) preferredBackends = [ann.BackendId('CpuAcc'), ann.BackendId('CpuRef')] opt_network, messages = ann.Optimize(network, preferredBackends, runtime.GetDeviceSpec(), ann.OptimizerOptions()) netid, = runtime.LoadNetwork(opt_network)
Get output binding information for an output layer by using the layer name.
output_names = parser.GetSubgraphOutputTensorNames(graph_id) output_list = [] output_list.append(parser.GetNetworkOutputBindingInfo(graph_id, output_names[0])) output_list.append(parser.GetNetworkOutputBindingInfo(graph_id, output_names[1])) output_list.append(parser.GetNetworkOutputBindingInfo(graph_id, output_names[2])) output_list.append(parser.GetNetworkOutputBindingInfo(graph_id, output_names[3])) output_tensors = ann.make_output_tensors(output_list)
Load an image and create an inputTensor for inference.
image = cv2.imread('./test.jpg') image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image_resized = cv2.resize(image_rgb, (300, 300)) input_data = np.expand_dims(np.asarray(image_resized, dtype=np.uint8), axis=0)
input_tensors = ann.make_input_tensors([input_binding_info], [input_data]) print("tensor: ", input_tensors) print("tensor: ", output_tensors)
runtime.EnqueueWorkload(0, input_tensors, output_tensors)
results = ann.workload_tensors_to_ndarray(output_tensors) print(results)