ARM-software / armnn

Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn
https://developer.arm.com/products/processors/machine-learning/arm-nn
MIT License
1.14k stars 307 forks source link

ArmNN-linux-x86_64 produces incorrect results for an int8 QAT tflite model #658

Closed liamsun2019 closed 1 month ago

liamsun2019 commented 2 years ago

Hi author, I did some tests basing on ArmNN-linux-x86_64 to inference an int8 QAT tflite model. I wrote a sample code and ran the executable under ubuntu-18.04. It looked that the results were incorrect, compared to the results running on python.

  1. ArmNN-linux-x86_64.tar.gz comes directly from prebuilt binaries.
  2. The tflite model is downloaded from the following website: https://tfhub.dev/google/lite-model/movenet/singlepose/thunder/tflite/int8/4
  3. The output is an array with 17x3 elements, the layout is: y,x,score,y,x,score..., where y,x indicate the cooridnats of one keypoint and the score for the confidence. y and x are both relative to 256x256 image size and the real value should be scaled with the actual input image size. For example, y=0.2 and the input image size is 1920x1080, then the real y pixel cooridnate is 1920x0.2=960. The keypoints order follows the coco human pose order: nose, left eye, right eye, left ear ..., left ankle, right ankle
  4. Please refer to the attached.

Any suggestions are appreciated. Thanks.

liamsun2019 commented 2 years ago

test.zip

MrSherish commented 2 years ago

Are you using TFLite 2.5? @MikeJKelly recently pointed out in my issue (https://github.com/ARM-software/armnn/issues/656) that 2.7 introduces some errors in ArmNN (x86_64 for sure) which need to be resolved.

liamsun2019 commented 2 years ago

Big thanks for your quick reply. I tried with tensorflow 2.3 and 2.5, both cases seemed abnormal. One thing I am curious is, I just used prebuilt libraries and pure c++ API to conduct the tests. Is that related to tflite version? You could reproduce the issue with the archives I uploaded. It looks that the produced y and x values are both smaller than expected. Not sure if there're some incorrect computations in the library.

liamsun2019 commented 2 years ago

I also conducted similar tests using TfLite delegate way. The inference results were correct. My test codes are below:

import numpy as np import tflite_runtime.interpreter as tflite import cv2 as cv

armnn_delegate = tflite.load_delegate(library="libarmnnDelegate.so", options={"backends": "CpuAcc,GpuAcc,CpuRef", "logging-severity":"info"})

interpreter = tflite.Interpreter(model_path="lite-model_movenet_singlepose_thunder_tflite_int8_4.tflite", experimental_delegates=[armnn_delegate])

interpreter.allocate_tensors() input_details = interpreter.get_input_details()

image = cv.imread("test.jpg")
input_image = cv.cvtColor(image, cv.COLOR_BGR2RGB) input_image = cv.resize(input_image, dsize=(256, 256)) input_image = input_image.reshape(-1, 256, 256, 3)

interpreter.set_tensor(input_details[0]['index'], input_image.astype(np.uint8)) interpreter.invoke()

output_details = interpreter.get_output_details() keypoints_with_scores = interpreter.get_tensor(output_details[0]['index']) keypoints_with_scores = np.squeeze(keypoints_with_scores) print(keypoints_with_scores)

The results are shown below: 0.1766229 0.6021235 0.92727023 0.16056627 0.6021235 0.95135516 0.16056627 0.5820527 0.88311446 0.16056627 0.5579678 0.81888795 0.16056627 0.5218404 0.81888795 0.2609202 0.57402444 0.92727023 0.24887772 0.46162802 0.92727023 0.3532458 0.5619819 0.6262084 0.36127412 0.33718917 0.97142595 0.47367048 0.57402444 0.81888795 0.47768465 0.4174723 0.92727023 0.5178262 0.57402444 0.95135516 0.5258545 0.4816988 0.92727023 0.61416596 0.7506473 0.95135516 0.71853405 0.40141568 0.92727023 0.8349446 0.7265624 0.95135516 0.7466332 0.20070784 0.92727023

Instead, the results are incorrect when I use c++ APIs to do the same inference with the same input image: 0.132467 0.561982 0.734591 0.116411 0.549939 0.883114 0.120425 0.541911 0.927270 0.112396 0.501770 0.883114 0.108382 0.473670 0.626208 0.216764 0.537897 0.818888 0.204722 0.409444 0.927270 0.305076 0.473670 0.734591 0.329161 0.301062 0.971426 0.413458 0.545925 0.818888 0.449586 0.393387 0.883114 0.453600 0.521840 0.927270 0.461628 0.433529 0.927270 0.586067 0.722548 0.927270 0.690435 0.361274 0.927270 0.802831 0.694449 0.734591 0.710506 0.148524 0.983468

MrSherish commented 2 years ago

Do you mean ArmNN's C++ API? AFAIK delegate way is the one currently supported, 'standalone' one is being depracated. You can use still use delegate mechanism within C++.

liamsun2019 commented 2 years ago

@MrSherish Yes, I wrote test.cpp with ArmNN's C++ API and got inaccurate results. How do I understand 'standalone' you mentioned? Also, a naive question, what does AFAIK represent? I am a beginner and not familar with armnn yet. Thanks for your time.

james-conroy-arm commented 2 years ago

Hi @liamsun2019 ,

It's nice to hear that you have achieved correct output through the use of our Arm NN TF Lite Delegate. This is currently the preferred way to use Arm NN, whether through the Python or C++ (you can use the TF Lite Delegate in C++ also) APIs. When using the TF Lite Delegate with CpuAcc or GpuAcc backends, I'd recommend not to use CpuRef as a third choice backend as the fallback to Google's reference TF Lite runtime is likely to be faster in the event of an operator not being supported on CpuAcc/GpuAcc.

Your zipped code refers to the usage of our TF Lite Parser which does not provide fallback to TF Lite runtime in the same way that the TF Lite Delegate does in the event of operator unsupportedness in Arm NN. This means that you may want to use CpuRef fallback when using the TF Lite Parser. With the respect to the issue you are having here, we would have to investigate.

Let us know more about how we can help.

Cheers, James

MrSherish commented 2 years ago

@MrSherish Also, a naive question, what does AFAIK represent?

AFAIK is a short for 'as far as i know' :)

liamsun2019 commented 2 years ago

Hi @james-conroy-arm , My understanding is that, no matter tflite parser or delegate, both ways are supposed to have similar inference results. According to your comment, for tflite parser use case, what would be the action if some ops are not supported in inference stage? Just ignore that or throw an exception or something else? I got no error message when using the tflite parser approach.

liamsun2019 commented 2 years ago

Hi @MrSherish, Yes, I got it. ^_^

james-conroy-arm commented 2 years ago

@liamsun2019 You're correct, both the TF Lite Delegate and TF Lite Parser should have identical results. This suggests that there may be a bug in the TF Lite Parser or you may not be using it appropriately. We will need to look into this for you...

If an operator is not supported when using the TF Lite Parser API, a error is shown to the user and inference will not execute. With the TF Lite Delegate, unsupported ops fallback (i.e. delegate) to Google's TF Lite Runtime implementation. Based on what you said (inference executed successfully), all ops in your model are supported in Arm NN through the TF Lite Parser.

For more info on the TF Lite Delegate in general: https://www.tensorflow.org/lite/performance/implementing_delegate

James

liamsun2019 commented 2 years ago

Hi James, Thanks. I believe I've learned a lot. Before this issue is resolved, I will keep trying tflite delegate in C++. As https://github.com/ARM-software/armnn/issues/659 I posted, I'm being tortured by building delegate library with android NDK. ^_^

liamsun2019 commented 2 years ago

Hi @james-conroy-arm,@catcor01 , I have conducted some tests these days against several models. In particular, with a few local modifications, I generated the sample code for pose estimation. My tests showed that the inference results were incorrect, no matter c++ delegate mode or c++ parser mode. Here's my summary:

  1. C++ parser mode a. x86_64 As I already pointed out in this issue, the x and y output were both smaller than expected. b. cortex-a55 The x looked normal, while y and score were incorrect.

  2. C++ delegate mode a. x86_64 The x looked normal, while y and score were incorrect. b. cortex-a55 The x looked normal, while y and score were incorrect.

  3. python delegate mode The inference results are correct.

Please refer to the attachment for model and inference results. My guess is that there may be something wrong in the computation. Thanks for your time. pose_test.zip