Converting a SavedModel Tensorflow Format to Luxonis Blob format

dhruvmsheth commented 1 year ago

Issue Type

Support

OS

Windows, Other

OS architecture

armv7, armv6

Programming Language

Python

Framework

TensorFlow

Model name and Weights/Checkpoints URL

This is a custom trained model on edgeimpulse.com which provides a Tensorflow SavedModel as well as a Tensorflow Lite Model at the end of the training. The issue is converting a Tensorflow SavedModel to a Luxonis Blob model by first Freezing the Tensorflow Saved Model and then trying to use https://blobconverter.luxonis.com/.

Description

These are my files: SavedModel file: SavedModel.zip

Tflite file: Tflite_file.zip

Sorry for opening this issue since this is beyond the work you do but since I saw your work on the Luxonis discord on helping convert Tensorflow SavedModels to Luxonis Blob format, I just wanted to reach out as a last resort. I've tried many different alternatives:

1) Freezing a Tensorflow Saved Model. The method successfully freezes the SavedModel, however it fails while using https://blobconverter.luxonis.com/ to convert the Frozen format to a blob format.

Code to freeze the SavedModel:

import tensorflow as tf

# Load the saved model
loaded = tf.saved_model.load("/content/saved_model/")

# Extract the graph
graph = tf.function(loaded.signatures[tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY]).get_concrete_function(tf.TensorSpec(shape=[1, 320, 320, 3], dtype=tf.float32))
frozen_graph = graph.graph.as_graph_def()

# Save the frozen graph
with tf.io.gfile.GFile("/content/frozen_model.pb", "wb") as f:
    f.write(frozen_graph.SerializeToString())

Error I get with blobconverter:

[ ERROR ]  Cannot infer shapes or values for node "StatefulPartitionedCall".
[ ERROR ]  Expected DataType for argument 'dtype' not None.
[ ERROR ]  
[ ERROR ]  It can happen due to bug in custom shape infer function <function tf_native_tf_node_infer at 0x7f99c67e1af0>.
[ ERROR ]  Or because the node inputs have incorrect values/shapes.
[ ERROR ]  Or because input shapes are incorrect (embedded to the model or passed via --input_shape).
[ ERROR ]  Run Model Optimizer with --log_level=DEBUG for more information.
[ ERROR ]  Exception occurred during running replacer "REPLACEMENT_ID" (<class 'openvino.tools.mo.middle.PartialInfer.PartialInfer'>): Stopped shape/value propagation at "StatefulPartitionedCall" node. 
 For more information please refer to Model Optimizer FAQ, question #38. (https://docs.openvino.ai/latest/openvino_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html?question=38#question-38)

2) Using a SavedModel -> IR Representation (OpenVINO) -> Blob conversion I used the standard OpenVINO instructions on the site to convert the SavedModel into a .xml and a .bin file.

This is apparently what I used and it converted it successfully to .xml and .bin files, however the .xml file was just 2 Kb and the .bin file was 0 Kb Output while conversion:

[ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.
Find more information about API v2.0 and IR v11 at https://docs.openvino.ai/latest/openvino_2_0_transition_guide.html
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /content/saved_model.xml
[ SUCCESS ] BIN file: /content/saved_model.bin

Nevertheless, I tried to convert to a blob file and it converted it to a 1Kb file and when I tried to run it, the NN size was displayed to be 3x320 which was absurd since it was trained on 320x320 images [1,320,320,3] format.

I reran the conversion again using:

!mo --input_shape [1,320,320,3] --saved_model_dir /content/saved_model/ --layout "ncwh->nhwc"

However, this time when I converted it to blob and used it while running, it gave me an error stating the bounding boxes contain x=1,y=0,w=0,h=0.

3) This time I tried a .tflite to .onnx approach using https://github.com/zhenhuaw-me/tflite2onnx which wasn't successful in converting it to onnx in the first stage itself. ( A .tflite model is also included on the EdgeImpulse dashboard so I thought of trying this out)

Code:

import tflite2onnx

tflite_path = '/content/trained.tflite'
onnx_path = '/content/model.onnx'

tflite2onnx.convert(tflite_path, onnx_path)

Error:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
[<ipython-input-37-c193f27d3323>](https://localhost:8080/#) in <module>
      4 onnx_path = '/content/model.onnx'
      5 
----> 6 tflite2onnx.convert(tflite_path, onnx_path)

4 frames
[/usr/local/lib/python3.8/dist-packages/tflite2onnx/op/common.py](https://localhost:8080/#) in create(self, index)
    152             if opcode in tflite.BUILTIN_OPCODE2NAME:
    153                 name = tflite.opcode2name(opcode)
--> 154                 raise NotImplementedError("Unsupported TFLite OP: {} {}!".format(opcode, name))
    155             else:
    156                 raise ValueError("Opcode {} is not a TFLite builtin operator!".format(opcode))

NotImplementedError: Unsupported TFLite OP: 83 PACK!

Next, I tried using tf2onnx (https://github.com/onnx/tensorflow-onnx) using a SavedModel file Input:

python -m tf2onnx.convert --saved-model /content/saved_model/ --output /content/model.onnx

Output:

2023-01-19 17:18:58,166 - WARNING - '--tag' not specified for saved_model. Using --tag serve
2023-01-19 17:19:12,548 - INFO - Signatures found in model: [serving_default].
2023-01-19 17:19:12,548 - WARNING - '--signature_def' not specified, using first signature: serving_default
2023-01-19 17:19:12,550 - INFO - Output names: ['output_0', 'output_1', 'output_2', 'output_3']
2023-01-19 17:19:15,690 - INFO - Using tensorflow=2.9.2, onnx=1.13.0, tf2onnx=1.13.0/2c1db5
2023-01-19 17:19:15,690 - INFO - Using opset <onnx, 13>
2023-01-19 17:19:15,694 - INFO - Computed 0 values for constant folding
2023-01-19 17:19:15,700 - INFO - Optimizing ONNX model
2023-01-19 17:19:15,715 - INFO - After optimization: Const -3 (4->1), Identity -1 (4->3)
2023-01-19 17:19:15,716 - INFO - 
2023-01-19 17:19:15,716 - INFO - Successfully converted TensorFlow model /content/saved_model/ to ONNX
2023-01-19 17:19:15,716 - INFO - Model inputs: ['input']
2023-01-19 17:19:15,716 - INFO - Model outputs: ['output_0', 'output_1', 'output_2', 'output_3']
2023-01-19 17:19:15,716 - INFO - ONNX model is saved at /content/model.onnx

But after that, while converting it to blob using blobconverter, this is the issue:

[ ERROR ]  Numbers of inputs and mean/scale values do not match. 
 For more information please refer to Model Optimizer FAQ, question #61. (https://docs.openvino.ai/latest/openvino_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html?question=61#question-61)

I'm completely stuck and have tried everything I could and hoping there is some workaround. Thanks a lot for your open-source contributions! Any workaround on converting this to blob would be really helpful since I'm completing the project under a narrow time constraint.

Thanks! Dhruv Sheth

Relevant Log Output

No response

URL or source code for simple inference testing code

SavedModel file: SavedModel.zip

Tflite file: Tflite_file.zip

PINTO0309 commented 1 year ago

First, I have all the tools currently available to convert models between mainstream frameworks.

https://github.com/onnx/tensorflow-onnx (ONNX official tool, Insufficient optimization)
https://github.com/PINTO0309/onnx2tf
https://github.com/PINTO0309/tflite2tensorflow
https://github.com/PINTO0309/openvino2tensorflow

Since each tool has its own characteristics, it is difficult to say that this tool is the best, but tensorflow-onnx or tflite2tensorflow is probably the most suitable for your application.

Your model contains a special OP TFLite_Detection_PostProcess. This is a very special OP that can only be interpreted by the TensorFlow Lite runtime. I expect that your model was generated using the TensorFlow Object Detection API or something similar.

Therefore, it is necessary to replace this special OP with a general OP and then perform a conversion that optimizes it for OpenVINO and Myriad. tflite2tensorflow and openvino2tensorflow have that capability. As a result of continued enhancements without regard to the name of the tool, a single tool is able to convert models between multiple frameworks.

First, I rarely use saved_model. I use the tfltie model instead. This is because the structure of the model is determined to be in a nicely optimized state. tensorflow-onnx does not optimize the model well enough.

https://github.com/PINTO0309/tflite2tensorflow

xhost +local: && \
docker run --gpus all -it --rm \
-v `pwd`:/home/user/workdir \
-v /tmp/.X11-unix/:/tmp/.X11-unix:rw \
--device /dev/video0:/dev/video0:mwr \
--net=host \
-e XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR \
-e DISPLAY=$DISPLAY \
--privileged \
ghcr.io/pinto0309/tflite2tensorflow:latest

tflite2tensorflow \
--model_path trained.tflite \
--flatc_path ../flatc \
--schema_path ../schema.fbs \
--output_pb \
--optimizing_for_openvino_and_myriad

tflite2tensorflow \
--model_path trained.tflite \
--flatc_path ../flatc \
--schema_path ../schema.fbs \
--output_no_quant_float32_tflite \
--output_onnx \
--onnx_opset 11 \
--output_openvino_and_myriad

or

http://blobconverter.luxonis.com/

We can see that the level of optimization of the model's structure is different from other tools.

dhruvmsheth commented 1 year ago

Thanks a lot @PINTO0309 for the explanation! I'll try out tflite2tensorflow and openvino2tensorflow today! In the meantime, if you do happen to have the converted blob, could you attach it in the discussion here? Would be really helpful in trying it out on OAK-D before going through the conversion process. Sorry for the trouble and thanks for the prompt response! Really appreciate all your help!

Best, Dhruv

PINTO0309 commented 1 year ago

https://s3.ap-northeast-2.wasabisys.com/temp-models/ PINTO_model_zoo_323/saved_model.zip

dhruvmsheth commented 1 year ago

@PINTO0309 sorry for bothering you again. There seems to be some sort of loss during conversion. The detections are erroneous after converting it to blob format. It's not like the accuracy is low, the detections are very random across the frame and the model is not able to detect the object. Any clue on what's happening here?

For clarification, I'm converting the .onnx generated model to .blob using the blob converter since the .blob generated directly through the script doesn't work with oak-d.

dhruvmsheth commented 1 year ago

Additionally, I tried to follow the code below:

import argparse
import tensorflow as tf
import sys, subprocess
#import tf2onnx

parser = argparse.ArgumentParser(description='TFLite to ONNX')
parser.add_argument('--tflite-file', type=str, required=True)
parser.add_argument('--out-file', type=str, required=True)

args = parser.parse_args()

### TF Lite
# Load TFLite model and get input/output details
interpreter = tf.lite.Interpreter(model_path=args.tflite_file)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Get input and output names
input_name = input_details[0]['name']
output_name = output_details[0]['name']

# convert to ONNX
print ('Creating ONNX model...')
CMD = "python3 -u -m tf2onnx.convert --verbose --tflite {0} --opset {1} --inputs-as-nchw {2} --inputs {2} --outputs {3} --output {4}".format(args.tflite_file, 12, input_name, output_name, args.out_file)
subprocess.run(CMD, shell=True, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

This gives me an onnx model too which I convert to (.blob) through https://blobconverter.luxonis.com/ but this generates too many false positives too.

I set the model optimizer params as none and myriadx params as none too because the default params gives me a model which gives the error:

[18443010F188940F00] [1.10.3.4] [5.907] [NeuralNetwork(0)] [error] Input tensor 'mean' (1) exceeds available data range. Data size (270000B), tensor offset (270016), size (1B) - skipping inference

@PINTO0309 Any idea if the params are wrong or if I should be using different params. Input size is 300x300 Thanks and sorry for the trouble

PINTO0309 / PINTO_model_zoo