PINTO0309 / tflite2tensorflow

Generate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite, ONNX, OpenVINO, Myriad Inference Engine blob and .pb from .tflite. Support for building environments with Docker. It is possible to directly access the host PC GUI and the camera to verify the operation. NVIDIA GPU (dGPU) support. Intel iHD GPU (iGPU) support. Supports inverse quantization of INT8 quantization model.
https://qiita.com/PINTO
MIT License
258 stars 38 forks source link

The same generated IR gives different results on CPU and MYRIAD #9

Closed geaxgx closed 3 years ago

geaxgx commented 3 years ago

1. OS Ubuntu 18.04

2. OS Architecture x86_64

3. Version of OpenVINO 2021.2.185

4. Version of tflite2tensorflow 1.8.0

11. URL or source code for simple inference testing code: https://github.com/geaxgx/openvino_blazepose

12. Issue Details

Hi Katsuya, I would like to have your opinion on the following problem: I have used tflite2tensorflow 1.8.0 to convert the new version of mediapipe Blazepose models (mediapipe version 0.8.4). In this new version, now we have 3 pose landmark models (full, lite and heavy) and one common pose detection model. Then I try to test my code BlazeposeOpenvino.py with each model first on CPU then on MYRIAD (OAK-D). Everything seems to work great except when I run the heavy model on the MYRIAD. For instance, below is the outuput of the heavy model running on the CPU (which looks accurate): heavy_on_cpu And below is the output of the same heavy model running on the MYRIADX : heavy_on_myriad We can see the skeleton is kind of distorded. Actually, in many cases, with other images, we don't even have a skeleton drawn because the score given by the model is too low.

Do you have an idea of what the problem is ? The heavy model is much bigger (27Mo for the bin fle) than the other models. It takes about 30s to load on the MyriadX. But I guess the size is still acceptable otherwise I would get some error messages.

In case, you want to reproduce the problem:

# Clone my repo
git clone https://github.com/geaxgx/openvino_blazepose.git
cd openvino_blazepose
# Start your tflite2tensorflow docker
./docker_tflite2tensorflow.sh
cd workdir/models
# In my repo, there are only the FP32 IR version of the models, so we need to download the original tflite file 
# and then convert it using tflite2tensorflow. All is done with the following command:
./get_and_convert_heavy.sh
# Exit docker container
exit

# To test with the heavy model running on the CPU :
python BlazeposeOpenvino.py --lm_xml models/pose_landmark_heavy_FP16.xml -i img/yoga.jpg
# To test on the MYRIAD
python BlazeposeOpenvino.py --lm_xml models/pose_landmark_heavy_FP16.xml -i img/yoga.jpg --lm_device MYRIAD 

You can test also with the image img/yoga2.jpg (no skeleton detected).

Thank you.

PINTO0309 commented 3 years ago

Ahh... It seems that the specification of OpenVINO has changed so that it is now converted using Interpolate-4. :cry:

<layer id="462" name="model_1/model/up_sampling2d/resize/ResizeBilinear_lambda/ResizeBilinear" type="Interpolate" version="opset4">
    <data antialias="false" coordinate_transformation_mode="half_pixel" cube_coeff="-0.75" mode="linear_onnx" nearest_mode="round_prefer_floor" pads_begin="0,0,0,0" pads_end="0,0,0,0" shape_calculation_mode="sizes"/>

Maybe the following fixes will work for now. From.

coordinate_transformation_mode="half_pixel"

To.

coordinate_transformation_mode="align_corners"
geaxgx commented 3 years ago

ResizeBilinear again, like in this previous issue https://github.com/PINTO0309/tflite2tensorflow/issues/4 ? Unfortunately using coordinate_transformation_mode="align_corners", I still have the problem. I will upgrade my version of openvino (currently 2021.2.185) to be consistent with the version used in tflite2tensorflow. The release notes of 2021.3 says, some support of Interpolate was added for myriad.

geaxgx commented 3 years ago

No, I get the same behavior with 2021.3.394.

And if I use a older version of tflite2tensorflow which uses an older version of Openvino, it would convert into Interpolate-1, no ? I still have a old docker image of tflite2tensorflow on my disk. It is six weeks old, I guess it relies on 2021.2.185. I can try with that version.

PINTO0309 commented 3 years ago

The inference results are very poor even after applying the transformation in the tensorflow-onnx -> openvino flow. It's very strange.

PINTO0309 commented 3 years ago

I'll take some time to look into it after tomorrow.

geaxgx commented 3 years ago

I agree it is strange. Because the other models (lite and full) are also using Interpolate-4 and they work well on Myriad.

geaxgx commented 3 years ago

FYI, I have used that older docker image of tflite2tensorflow (1.6.4) to regenerate the IR files. Now Interpolate-1 is used:

<layer id="450" name="model_1/model/up_sampling2d/resize/ResizeBilinear_lambda/ResizeBilinear" type="Interpolate" version="opset1">
                        <data align_corners="1" antialias="0" axes="2,3" mode="linear" pads_begin="0" pads_end="0"/>

But I still get incorrect values on Myriad.

PINTO0309 commented 3 years ago

Thank you very much. This is very helpful in narrowing down the patterns that need to be verified.

PINTO0309 commented 3 years ago

I lost track of the nature of the problem. :crying_cat_face:

Convert

tflite2tensorflow \
--model_path pose_landmark_heavy.tflite \
--flatc_path ../flatc \
--schema_path ../schema.fbs \
--output_pb \
--optimizing_for_openvino_and_myriad

tflite2tensorflow \
--model_path pose_landmark_heavy.tflite \
--flatc_path ../flatc \
--schema_path ../schema.fbs \
--output_no_quant_float32_tflite \
--output_openvino_and_myriad

BlazeposeOpenvino.py

            elif len(self.regions) == 1:
                r = self.regions[0]
                frame_nn = mpu.warp_rect_img(r.rect_points, video_frame, self.lm_w, self.lm_h)
                # Transpose hxwx3 -> 1x3xhxw
                # frame_nn = np.transpose(frame_nn, (2,0,1))[None,]
                frame_nn = np.transpose(frame_nn, (2,0,1))[None,] / 255.0

Run inference

# FP16
python3 BlazeposeOpenvino.py --lm_xml openvino/FP16/pose_landmark_heavy_FP16.xml -i img/yoga.jpg

# FP32
python3 BlazeposeOpenvino.py --lm_xml openvino/FP32/pose_landmark_heavy_FP32.xml -i img/yoga.jpg

Converted IR model https://drive.google.com/file/d/1OtsAYgz2Wse9isJR_1aZm-pr0GpRXZKM/view?usp=sharing

geaxgx commented 3 years ago

Yes, it works on CPU. But the problem is on MyriadX (I am using the MyriadX of an OAK-D). If you have a NCS2 or an OAK-D, you can try: python3 BlazeposeOpenvino.py --lm_xml openvino/FP16/pose_landmark_heavy_FP16.xml -i img/yoga.jpg --lm_device MYRIAD

Maybe it is a pure problem Openvino. I wanted to have your opinion before I open an issue with Intel support.

PINTO0309 commented 3 years ago

Okay, I see. I'll take a look at NCS2.

PINTO0309 commented 3 years ago

Hmmm... NCS2's processing has always been buggy and sometimes doesn't work properly. I encountered a similar problem when I was verifying YoloV3-tiny at its fastest. Screenshot 2021-05-17 20:28:36

PINTO0309 commented 3 years ago

Huh? It seems to be very different from your result. It's strange. How can there be such a big difference between the way normalization is done inside the model and the way it is done in the outer Python logic? My guess is that there is a problem with division.

geaxgx commented 3 years ago

You managed to get a better looking skeleton than mine. I was using tflite2tensorflow 1.8. I see there is a version 1.8.1 now. Could it explain the difference ?

geaxgx commented 3 years ago

Can you try also with img/yoga2.jpg please ? For me, I don't get any skeleton (score too low).

PINTO0309 commented 3 years ago
  1. Fixed FULLY_CONNECTED (MATMUL) bug.
  2. Support for quantization of multiple data input models

I don't think this is a relevant fix for this model. Rather, it seems that passing or not passing std =[255.0, 255.0, 255.0] to the optimizer makes a bigger difference.

PINTO0309 commented 3 years ago

Screenshot 2021-05-17 20:43:45

geaxgx commented 3 years ago

Definitely better than no skeleton ! Strange how the face keypoints are shifted in the hair.

PINTO0309 commented 3 years ago

Maybe there is a misalignment in the scale conversion process during the processing.

PINTO0309 commented 3 years ago

I misremembered to use your .sh and ended up running the optimizer conversion command myself. Therefore, the following parameters were converted without specification.

${arg_scale_values}
                # frame_nn = np.transpose(frame_nn, (2,0,1))[None,]
                frame_nn = np.transpose(frame_nn, (2,0,1))[None,] / 255.0
geaxgx commented 3 years ago

Ah ah I see :-)) It is a good "misremembering" since it allows us to pinpoint the difference in behavior.

Yet I don't understand why the other models (lite and full) are working well on MYARIAD. Below : python BlazeposeOpenvino.py --lm_xml models/pose_landmark_full_FP16.xml -i img/yoga2.jpg --lm_device MYRIAD image

PINTO0309 commented 3 years ago

The model on the left is the one you converted, and the one on the right is the one I converted. There is a big difference in the value of the first Convolution weight. It is probably divided by std=255.0. Screenshot 2021-05-17 21:13:49

0.03704833984375 / 255 = 0.000145288

but, It seems that the first weight number in your transformed model is not simply a number divided by 255. What I am trying to tell you is that there seems to be a bug in the optimizer's division. I have no idea why the lite model works so well...

0.00012266870180610567

Now notice that the weights of the lite model you transformed hold an order of magnitude more weight than the heavy model, and are expected to be less affected by calculation errors, overflows, or ignored numbers.

-0.0012475107796490192

Screenshot 2021-05-17 21:21:32 However, all of this is just my prediction.

PINTO0309 commented 3 years ago

This is just a guess on my part, but I think that the accuracy would be higher if the model optimizer were not given mean and std, but instead calculated in the Python logic.

geaxgx commented 3 years ago

Thanks for your advice. I am currently busy but later on I will try your suggestion.

geaxgx commented 3 years ago

So I have tried your suggestion of doing the scaling (and also the switching BGR -> RGB) in the python code. I confirm the skeleton looks better this way. Unfortunately it is not good enough to be reliablly used. The difference between what I get on CPU and what I get on Myriad can vary a lot. For instance, when I try with a webcam pointing to my face, the face landmarks are way off the correct position. Also in addition to the landmarks, the model can also output a kind of segmentation mask around the body (it can be visualized by pressing the "s" key).

Below is what I get when run on the CPU: heavy_cpu_python_scaling

And now on the Myriad: heavy_myriad_python_scaling

We can see that the segmentation mask in the latter case is kind of garbage.

PINTO0309 commented 3 years ago

I have been advised by an Intel engineer. He said that when I infer FP16 on the CPU, it is automatically mapped to FP32 internally. I've been thinking a lot about it since then, and I think the reason why the results are so bad only when FP16 is run in Myriad is that the model handles numbers that are too small or too large for the arithmetic precision of Float16.

geaxgx commented 3 years ago

Ah it could be a good explanation. In that case, there is not much hope I guess.

geaxgx commented 3 years ago

Just tried on internal GPU: image

Garbage too ! It confirms your explanation.

PINTO0309 commented 3 years ago

I'm almost certain of it. :disappointed:

PINTO0309 commented 3 years ago

Although the tflite file from which the conversion was made is a float16 model, the figure shows that it can be reasoned well because it internally simulates a float32 model. Screenshot 2021-05-18 01:37:11

geaxgx commented 3 years ago

The picture above was with the FP16 model. And now with the FP32 model on the internal GPU: image

PINTO0309 commented 3 years ago

Hmm. I wonder what the internal implementation is.

geaxgx commented 3 years ago

Ha ha I was about to close the issue. But do you still have hope to find a solution ? :-)

PINTO0309 commented 3 years ago

No. Unfortunately, the situation is hopeless. :crying_cat_face: I'll give up gracefully this time.

geaxgx commented 3 years ago

This time was not 100% success but yet it was insightful for me. Thanks again for your great help Katsuya ! I appreciate a lot !

geaxgx commented 3 years ago

Hi @PINTO0309 ! I was doing once again the conversion of the tflite to IR openvino when I realized what was going on there : https://github.com/PINTO0309/tflite2tensorflow/issues/9#issuecomment-842280712 The mismatch of 0.03704833984375 / 255 = 0.000145288 and 0.00012266870180610567 is not a bug in the optimizer's division. The problem is just that I was using the --reverse_input_channels flag in the optimizer command line. So the weights corresponding to R and B are also reversed for the convolution operation. We can actually found the 0.000145288 value a few lines later in your screencopy :-)

BTW I have a question: in the models you share in your model zoo https://github.com/PINTO0309/PINTO_model_zoo/tree/main/053_BlazePose/10_lite_full_heavy_version_May6_2021 , coordinate_transformation_mode="half_pixel" has already been replaced by: coordinate_transformation_mode="align_corners" Is this fix a modification you are doing manually ?

PINTO0309 commented 3 years ago

@geaxgx I see! It made sense to me that there was a discrepancy in the weights. I've always been one to not specify --reverse_input_channels.

Is this fix a modification you are doing manually ?

Yes. I've revised and re-committed it while discussing with you. Since the .xml is separated from the weights (.bin), it can be reused as long as the structure of the model is not changed significantly, so I preemptively addressed the issue before you guys raised it.