Output of converted model is different from original model's output

MekhailS commented 3 years ago

1. OS you are using e.g. Ubuntu 20.04, WIndows10, etc

The conversion is done on Windows 10

2. OS Architecture e.g. x86_64, armv7l, aarch64, etc

x64

3. Version of OpenVINO e.g. 2021.2.185, etc

OpenVino 2021.4.582

4. Version of TensorFlow e.g. v2.4.1, tf-nightly==2.5.0.dev20210128, etc

On machine used for conversion version of TensorFlow is 2.4.2

8. Version of ONNX e.g. v1.8.0, etc

On machine used for conversion version of ONNX is 1.10.1

13. Issue Details

First of all, great project! Using your script i was able at least to compile my model to edgetpu. But the outputs of converted model is too much different from outputs of original model (.onnx) (model has 3 outputs). You can inspect this difference visually in script on gcolab 12. Since my model has hardswish activations, I am also using argument --optimizing_hardswish_for_edgetpu of openvino2tensorflow script.

Also, I tried to force add Transpose before Reshape (since it solved problem #58), using --weight_replacement_config option, but it didn't help.

contents of .json config file

``` { "format_version": 2, "layers": [ { "layer_id": 503, "type": "Transpose", "replace_mode": "insert_before", "values": [ 0, 2, 3, 1 ] } ] } ```

PINTO0309 commented 3 years ago

Thank you for posting the issue with detailed information. It will save me more than a few hours of research time.

First, when I convert your ONNX model to tflite without modifying it, I get a conversion error in Broadcast operations.

xhost +local: && \
docker run --gpus all -it --rm \
-v `pwd`:/home/user/workdir \
-v /tmp/.X11-unix/:/tmp/.X11-unix:rw \
--device /dev/video0:/dev/video0:mwr \
--net=host \
-e XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR \
-e DISPLAY=$DISPLAY \
--privileged \
pinto0309/openvino2tensorflow:latest

cd workdir

python3 -m onnxsim proxy_mobilenetv3.onnx proxy_mobilenetv3_256x256.onnx --input-shape 1,3,256,256

MODEL=proxy_mobilenetv3
H=256
W=256
$INTEL_OPENVINO_DIR/deployment_tools/model_optimizer/mo.py \
--input_model ${MODEL}_${H}x${W}.onnx \
--data_type FP32 \
--output_dir openvino/${H}x${W}/FP32

openvino2tensorflow \
--model_path openvino/${H}x${W}/FP32/${MODEL}_${H}x${W}.xml \
--output_saved_model \
--output_pb \
--output_no_quant_float32_tflite

TensorFlow/Keras model building process starts ======================================
ERROR: index 2 is out of bounds for axis 0 with size 2
ERROR: model_path  : openvino/256x256/FP32/proxy_mobilenetv3_256x256.xml
ERROR: weights_path: openvino/256x256/FP32/proxy_mobilenetv3_256x256.bin
ERROR: layer_id    : 536
ERROR: input_layer0 layer_id=534: KerasTensor(type_spec=TensorSpec(shape=(1, 1), dtype=tf.float32, name=None), name='tf.math.maximum/Maximum:0', description="created by layer 'tf.math.maximum'")
ERROR: input_layer1 layer_id=535: Const(ndarray).shape  (2,)
array([   1, 1024])
ERROR: The trace log is below.
Traceback (most recent call last):
  File "openvino2tensorflow.py", line 3172, in convert
    target_shape[0], target_shape[2], target_shape[3], target_shape[1]
IndexError: index 2 is out of bounds for axis 0 with size 2

This is not a problem with the model, but because the current openvino2tensorflow broadcast conversion process only supports 4-dimensional processing. First, I need to improve the conversion process of the tool, so please wait a bit.

PINTO0309 commented 3 years ago

Broadcast bug fixes. https://github.com/PINTO0309/openvino2tensorflow/releases/tag/v1.17.3

I will now begin the structural analysis of the model. :smile_cat:

MekhailS commented 3 years ago

Thanks for replying! I appreciate your work

PINTO0309 commented 3 years ago

import numpy as np
import cv2

#==============================================================
from tensorflow.lite.python.interpreter import Interpreter
interpreter = Interpreter(model_path='saved_model/model_float32.tflite', num_threads=4)
interpreter.allocate_tensors()
input_blob = interpreter.get_input_details()
output_blob = interpreter.get_output_details()
img = cv2.imread("test.jpg")
img = cv2.resize(img, (256, 256))
img = img.astype(np.float32)
img = img[np.newaxis, :, :, :]
interpreter.set_tensor(input_blob[0]['index'], img)
interpreter.invoke()
out0 = interpreter.get_tensor(output_blob[0]['index'])
out1 = interpreter.get_tensor(output_blob[1]['index'])
out2 = interpreter.get_tensor(output_blob[2]['index'])
print('@@@@@@@@@@@@@@@@@@@@@@@@@@@ tflite')
print(f'1.shape logits: {out1.shape}')
print(f'0.shape colors: {out0.shape}')
print(f'2.shape embeddings: {out2.shape}')
print(f'1 logits: {np.sum(out1)}')
print(f'0 colors: {np.sum(out0)}')
print(f'2 embeddings: {np.sum(out2)}')

#==============================================================
from openvino.inference_engine import IECore
ie = IECore()
model='proxy_mobilenetv3_256x256'
net = ie.read_network(f'{model}.xml', f'{model}.bin')
input_blob = next(iter(net.input_info))
out_blob   = next(iter(net.outputs))
exec_net = ie.load_network(net, 'CPU')
res = exec_net.infer(inputs={input_blob: img.transpose(0,3,1,2)})
print('@@@@@@@@@@@@@@@@@@@@@@@@@@@ OpenVINO')
print(f'2.shape logits: {res["logits"].shape}')
print(f'0.shape colors: {res["colors"].shape}')
print(f'1.shape embeddings: {res["embeddings"].shape}')
print(f'2 logits: {np.sum(res["logits"])}')
print(f'0 colors: {np.sum(res["colors"])}')
print(f'1 embeddings: {np.sum(res["embeddings"])}')

#==============================================================
import onnxruntime
onnx_session = onnxruntime.InferenceSession('proxy_mobilenetv3_256x256.onnx')
onnx_input = {onnx_session.get_inputs()[0].name: img.transpose(0,3,1,2)}
onnx_output = onnx_session.run(None, onnx_input)
print('@@@@@@@@@@@@@@@@@@@@@@@@@@@ ONNX')
print(f'0.shape logits: {onnx_output[0].shape}')
print(f'1.shape colors: {onnx_output[1].shape}')
print(f'2.shape embeddings: {onnx_output[2].shape}')
print(f'0 logits: {np.sum(onnx_output[0])}')
print(f'1 colors: {np.sum(onnx_output[1])}')
print(f'2 embeddings: {np.sum(onnx_output[2])}')

@@@@@@@@@@@@@@@@@@@@@@@@@@@ tflite
1.shape logits: (1, 92)
0.shape colors: (1, 84)
2.shape embeddings: (1, 1024)
1 logits: 599226.5625
0 colors: -35951.8125
2 embeddings: -65292.734375
@@@@@@@@@@@@@@@@@@@@@@@@@@@ OpenVINO
2.shape logits: (1, 92)
0.shape colors: (1, 84)
1.shape embeddings: (1, 1024)
2 logits: 19.064287185668945
0 colors: -35790.28515625
1 embeddings: -2.0427639484405518
@@@@@@@@@@@@@@@@@@@@@@@@@@@ ONNX
0.shape logits: (1, 92)
1.shape colors: (1, 84)
2.shape embeddings: (1, 1024)
0 logits: 19.064273834228516
1 colors: -35790.359375
2 embeddings: -2.0427613258361816

ONNX and OpenVINO have some errors, but are generally OK. Therefore, the next step is to edit the XML file and make it a little smaller to identify which layer is causing the difference. I won't be able to work on it until the kids go to bed, so my next reply will be in three to four hours.

PINTO0309 commented 3 years ago

First of all, there seems to be a problem with the HardSigmoid conversion. https://drive.google.com/drive/folders/1hiw8DuS_LN88-AGooU_tq8MDk51fOqfC?usp=sharing

import numpy as np
import cv2

#==============================================================
from tensorflow.lite.python.interpreter import Interpreter
interpreter = Interpreter(model_path='saved_model/model_float32.tflite', num_threads=4)
interpreter.allocate_tensors()
input_blob = interpreter.get_input_details()
output_blob = interpreter.get_output_details()
img = cv2.imread("test.jpg")
img = cv2.resize(img, (256, 256))
img = img.astype(np.float32)
img = img[np.newaxis, :, :, :]
interpreter.set_tensor(input_blob[0]['index'], img)
interpreter.invoke()
out0 = interpreter.get_tensor(output_blob[0]['index'])
print('@@@@@@@@@@@@@@@@@@@@@@@@@@@ tflite')
print(f'0.shape: {out0.shape}')
print(f'0: {np.sum(out0)}')

#==============================================================
from openvino.inference_engine import IECore
ie = IECore()
model='proxy_mobilenetv3_256x256'
net = ie.read_network(f'{model}.xml', f'{model}.bin')
input_blob = next(iter(net.input_info))
out_blob   = next(iter(net.outputs))
exec_net = ie.load_network(net, 'CPU')
res = exec_net.infer(inputs={input_blob: img.transpose(0,3,1,2)})
print('@@@@@@@@@@@@@@@@@@@@@@@@@@@ OpenVINO')
print(f'0.shape: {res["GlobalAveragePool_275/reduce"].shape}')
print(f'0: {np.sum(res["GlobalAveragePool_275/reduce"])}')

@@@@@@@@@@@@@@@@@@@@@@@@@@@ tflite
0.shape: (1, 1, 1, 960)
0: 27701.671875
@@@@@@@@@@@@@@@@@@@@@@@@@@@ OpenVINO
0.shape: (1, 960, 1, 1)
0: 27588.775390625

PINTO0309 commented 3 years ago

v1.17.4 Fixed a bug in HardSigmoid. Continuing to verify subsequent layers.

GlobalAveragePool_275/reduce

@@@@@@@@@@@@@@@@@@@@@@@@@@@ tflite
0.shape: (1, 1, 1, 960)
0: 27588.80078125
@@@@@@@@@@@@@@@@@@@@@@@@@@@ OpenVINO
0.shape: (1, 960, 1, 1)
0: 27588.775390625

Full

@@@@@@@@@@@@@@@@@@@@@@@@@@@ tflite
1.shape logits: (1, 92)
0.shape colors: (1, 84)
2.shape embeddings: (1, 1024)
1 logits: 600571.4375
0 colors: -35790.3203125
2 embeddings: -64352.0546875
@@@@@@@@@@@@@@@@@@@@@@@@@@@ OpenVINO
2.shape logits: (1, 92)
0.shape colors: (1, 84)
1.shape embeddings: (1, 1024)
2 logits: 19.064287185668945
0 colors: -35790.28515625
1 embeddings: -2.0427639484405518
@@@@@@@@@@@@@@@@@@@@@@@@@@@ ONNX
0.shape logits: (1, 92)
1.shape colors: (1, 84)
2.shape embeddings: (1, 1024)
0 logits: 19.064273834228516
1 colors: -35790.359375
2 embeddings: -2.0427613258361816

MekhailS commented 3 years ago

Since now problem is located somewhere in last part of the network (which outputs embeddings), I think explanation of architecture could help: embeddings_layer The highlited green part is basically Linear layer with L2 normalization of output (Clip_293 is added to prevent denominator from being 0)

PINTO0309 commented 3 years ago

Thanks! I've already identified the problem area. There was a bug in the ReduceL2 conversion. I'm fixing it now.

PINTO0309 commented 3 years ago

Yes! However, due to differences in the internal computing specifications of different frameworks, a slight error will occur.

@@@@@@@@@@@@@@@@@@@@@@@@@@@ tflite
1.shape logits: (1, 92)
0.shape colors: (1, 84)
2.shape embeddings: (1, 1024)
1 logits: 19.103252410888672
0 colors: -35790.3203125
2 embeddings: -2.046940326690674
@@@@@@@@@@@@@@@@@@@@@@@@@@@ OpenVINO
2.shape logits: (1, 92)
0.shape colors: (1, 84)
1.shape embeddings: (1, 1024)
2 logits: 19.064287185668945
0 colors: -35790.28515625
1 embeddings: -2.0427639484405518
@@@@@@@@@@@@@@@@@@@@@@@@@@@ ONNX
0.shape logits: (1, 92)
1.shape colors: (1, 84)
2.shape embeddings: (1, 1024)
0 logits: 19.064273834228516
1 colors: -35790.359375
2 embeddings: -2.0427613258361816

PINTO0309 commented 3 years ago

v1.17.5 Fixed a bug in ReduceL2. https://github.com/PINTO0309/openvino2tensorflow/releases/tag/v1.17.5

PINTO0309 / openvino2tensorflow