custom model with TF2 (2.11.0) for M.2 Edge TPU

euDominic commented 1 year ago

Hello.

I am new to this matter. Through a lot of reading and various guides, I managed to get TensorFlow 2.11.0 running on Ubuntu 20.04 WSL. I was also able to create a model and use the tensoboard to see the progress.

But the TF2 models don't seem to work with the Coral TPU.

Or I am doing something fundamentally wrong.

Maybe someone has an idea what could be wrong.

My setup....

from https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md , I got SSD MobileNet V2 FPNLite 320x320.

from https://github.com/tensorflow/models a git-clone

with generate_tfrecord.py I created train.record and test.record

with model_main_tf2.py --model_dir=/home/dom/tensorflow/workspace/training_demo/models/ssd_mobilenet_v2_fpnlite --pipeline_config_path=/home/dom/tensorflow/workspace/training_demo/models/ssd_mobilenet_v2_fpnlite/pipeline.config --num_train_steps=2000

created the model.

with model_main_tf2.py --model_dir=/home/dom/tensorflow/workspace/training_demo/models/ssd_mobilenet_v2_fpnlite --pipeline_config_path=/home/dom/tensorflow/workspace/training_demo/models/ssd_mobilenet_v2_fpnlite/pipeline. config --checkpoint_dir=/home/dom/tensorflow/workspace/training_demo/models/ssd_mobilenet_v2_fpnlite

started an eval.

with exporter_main_v2.py --input_type=image_tensor --pipeline_config_path=/home/dom/tensorflow/workspace/training_demo/models/ssd_mobilenet_v2_fpnlite/pipeline. config --trained_checkpoint_dir=/home/dom/tensorflow/workspace/training_demo/models/ssd_mobilenet_v2_fpnlite --output_directory=/home/dom/tensorflow/workspace/training_demo/exported-models/

and

export_tflite_graph_tf2.py --pipeline_config_path=/home/dom/tensorflow/workspace/training_demo/models/ssd_mobilenet_v2_fpnlite/pipeline. config --trained_checkpoint_dir=/home/dom/tensorflow/workspace/training_demo/models/ssd_mobilenet_v2_fpnlite --output_directory=/home/dom/tensorflow/workspace/training_demo/exported-models/tfliteexport/

one freeze and one model export each as TFLite

finally per tflite_convert --saved_model_dir=/home/dom/tensorflow/workspace/training_demo/exported-models/tfliteexport/saved_model --output_file=/home/dom/tensorflow/workspace/training_demo/exported-models/tfliteexport/saved_model/detect. tflite --input_shapes=1,300,300,3 --input_arrays=normalized_input_image_tensor --output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess: 2','TFLite_Detection_PostProcess:3' --inference_type=FLOAT --allow_custom_ops --mean_values=128 --std_dev_values=128 --change_concat_input_ranges=false --allow_nudging_weights_to_use_fast_gemm_kernel=true

as last then per edgetpu_compiler -s detect.tflite to create the coral file.

but with this probelem.....

input model: detect.tflite input size: 10.97MiB output model: detect_edgetpu.tflite Output size: 10.97MiB On-chip memory used for caching model parameters: 0.00B On-chip memory remaining for caching model parameters: 0.00B Off-chip memory used for streaming non-cached model parameters: 0.00B Number of edge TPU subgraphs: 0 Total number of operations: 157 Operation log: detect_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs. Number of operations that will run on Edge TPU: 0 Number of operations that will run on CPU: 157 See the operation log file for individual operation details.

pipeline.config

model { ssd { num_classes: 5 image_resizer { fixed_shape_resizer { height: 320 width: 320 } } feature_extractor { type: "ssd_mobilenet_v2_fpn_keras" depth_multiplier: 1.0 min_depth: 16 conv_hyperparams { regularizer { l2_regularizer { weight: 3.9999998989515007e-05 } } initializer { random_normal_initializer { mean: 0.0 stddev: 0.009999999776482582 } } activation: RELU_6 batch_norm { decay: 0.996999979019165 scale: true epsilon: 0.0010000000474974513 } } use_depthwise: true override_base_feature_extractor_hyperparams: true fpn { min_level: 3 max_level: 7 additional_layer_depth: 128 } } box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true use_matmul_gather: true } } similarity_calculator { iou_similarity { } } box_predictor { weight_shared_convolutional_box_predictor { conv_hyperparams { regularizer { l2_regularizer { weight: 3.9999998989515007e-05 } } initializer { random_normal_initializer { mean: 0.0 stddev: 0.009999999776482582 } } activation: RELU_6 batch_norm { decay: 0.996999979019165 scale: true epsilon: 0.0010000000474974513 } } depth: 128 num_layers_before_predictor: 4 kernel_size: 3 class_prediction_bias_init: -4.599999904632568 share_prediction_tower: true use_depthwise: true } } anchor_generator { multiscale_anchor_generator { min_level: 3 max_level: 7 anchor_scale: 4.0 aspect_ratios: 1.0 aspect_ratios: 2.0 aspect_ratios: 0.5 scales_per_octave: 2 } } post_processing { batch_non_max_suppression { score_threshold: 9.99999993922529e-09 iou_threshold: 0.6000000238418579 max_detections_per_class: 100 max_total_detections: 100 use_static_shapes: false } score_converter: SIGMOID } normalize_loss_by_num_matches: true loss { localization_loss { weighted_smooth_l1 { } } classification_loss { weighted_sigmoid_focal { gamma: 2.0 alpha: 0.25 } } classification_weight: 1.0 localization_weight: 1.0 } encode_background_as_zeros: true normalize_loc_loss_by_codesize: true inplace_batchnorm_update: true freeze_batchnorm: false } } train_config { batch_size: 6 data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { random_crop_image { min_object_covered: 0.0 min_aspect_ratio: 0.75 max_aspect_ratio: 3.0 min_area: 0.75 max_area: 1.0 overlap_thresh: 0.0 } } sync_replicas: true optimizer { momentum_optimizer { learning_rate { cosine_decay_learning_rate { learning_rate_base: 0.07999999821186066 total_steps: 50000 warmup_learning_rate: 0.026666000485420227 warmup_steps: 1000 } } momentum_optimizer_value: 0.8999999761581421 } use_moving_average: false } fine_tune_checkpoint: "/home/dom/tensorflow/workspace/training_demo/pre-trained-models/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/checkpoint/ckpt-0" num_steps: 50000 startup_delay_steps: 0.0 replicas_to_aggregate: 8 max_number_of_boxes: 100 unpad_groundtruth_tensors: false fine_tune_checkpoint_type: "detection" fine_tune_checkpoint_version: V2 } train_input_reader { label_map_path: "/home/dom/tensorflow/workspace/training_demo/annotations/label_map.pbtxt" tf_record_input_reader { input_path: "/home/dom/tensorflow/workspace/training_demo/annotations/train.record" } } eval_config { metrics_set: "coco_detection_metrics" use_moving_averages: false } eval_input_reader { label_map_path: "/home/dom/tensorflow/workspace/training_demo/annotations/label_map.pbtxt" shuffle: false num_epochs: 1 tf_record_input_reader { input_path: "/home/dom/tensorflow/workspace/training_demo/annotations/test.record" } }

graph_rewriter { quantization { delay: 48000 weight_bits: 8 activation_bits: 8 } }

pip list

Package Version

absl-py 1.4.0 anyio 3.6.2 apache-beam 2.43.0 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 arrow 1.2.3 astor 0.8.1 asttokens 2.2.1 astunparse 1.6.3 attrs 22.2.0 avro-python3 1.10.2 backcall 0.2.0 beautifulsoup4 4.11.1 bleach 5.0.1 cachetools 5.2.1 certifi 2022.12.7 cffi 1.15.1 chardet 5.1.0 charset-normalizer 2.1.1 click 8.1.3 cloudpickle 2.2.0 cmake 3.25.0 colorama 0.3.3 comm 0.1.2 contextlib2 21.6.0 contourpy 1.0.6 crcmod 1.7 cycler 0.11.0 cython 0.29.33 debugpy 1.6.5 decorator 5.1.1 defusedxml 0.7.1 dill 0.3.1.1 dm-tree 0.1.8 docopt 0.6.2 entrypoints 0.4 etils 1.0.0 executing 1.2.0 fastavro 1.7.0 fasteners 0.18 fastjsonschema 2.16.2 flatbuffers 23.1.4 fonttools 4.38.0 fqdn 1.5.1 guest 0.4.0 gin-config 0.5.0 git-clone 1.0.6 google-api-core 2.11.0 google-api-python client 2.72.0 google-auth 2.16.0 google-auth-httplib2 0.1.0 google-auth-oauthlib 0.4.6 google-pasta 0.2.0 googleapis-common-protos 1.58.0 grpcio 1.51.1 h5py 3.7.0 hdfs 2.7.0 html5lib 1.1 httplib2 0.20.4 idna 3.4 immutabledict 2.2.3 importlib-metadata 6.0.0 importlib-Ressourcen 5.10.2 ipykernel 6.20.1 ipython 8.8.0 ipython-genutils 0.2.0 ipywidgets 8.0.4 isoduration 20.11.0 jedi 0.18.2 Jinja2 3.1.2 joblib 1.2.0 jsonpointer 2.3 jsonschema 4.17.3 jupyter 1.0.0 jupyter-client 7.4.8 jupyter-Konsole 6.4.4 jupyter-core 5.1.3 jupyter-events 0.6.2 jupyter-server 2.0.6 jupyter-server-terminals 0.4.4 jupyterlab-pygments 0.2.2 jupyterlab-widgets 3.0.5 kaggle 1.5.12 keras 2.11.0 Keras-Applikationen 1.0.8 Keras-Vorverarbeitung 1.1.2 kiwisolver 1.4.4 libclang 15.0.6.1 lvis 0.5.3 lxml 4.9.2 Markdown 3.4.1 MarkupSafe 2.1.1 matplotlib 3.6.3 matplotlib-inline 0.1.6 mistune 2.0.4 nbclassic 0.4.8 nbclient 0.7.2 nbconvert 7.2.7 nbformat 5.7.3 nest-asyncio 1.5.6 notebook 6.5.2 notebook-shim 0.2.2 numpy 1.22.4 oauth2client 4.1.3 oauthlib 3.2.2 objekt-erkennung 0.1 objsize 0.5.2 opencv-python 4.7.0.68 opencv-python-headless 4.7.0.68 opt-einsum 3.3.0 orjson 3.8.5 paketierung 23.0 pandas 1.5.2 pandoc-Filter 1.5.0 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 pillow 9.4.0 pip 20.2.4 platformdirs 2.6.2 portalocker 2.6.0 prometheus-Klient 0.15.0 promise 2.3 prompt-toolkit 3.0.36 proto-plus 1.22.2 protobuf 3.19.6 psutil 5.9.4 ptyprocess 0.7.0 pure-eval 0.2.2 py-cpuinfo 9.0.0 pyarrow 9.0.0 pyasn1 0.4.8 pyasn1-Baustein 0.2.8 pycocotools 2.0.6 pycparser 2.21 pydot 1.4.2 Pygments 2.14.0 pymongo 3.13.0 pyparsing 2.4.7 pyrsistent 0.19.3 python-dateutil 2.8.2 python-json-logger 2.0.4 python-slugify 7.0.0 pytz 2022.7 PyYAML 5.4.1 pyzmq 25.0.0 qtconsole 5.4.0 QtPy 2.3.0 regex 2022.10.31 requests 2.28.1 requests-oauthlib 1.3.1 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rsa 4.9 sacrebleu 2.2.0 scikit-learn 1.2.0 scipy 1.10.0 Send2Trash 1.8.0 sentencepiece 0.1.97 seqeval 1.2.2 setuptools 65.6.3 six 1.16.0 sniffio 1.3.0 soupsieve 2.3.2.post1 stack-data 0.6.2 tabulieren 0.9.0 tensorboard 2.11.0 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tensorflow 2.11.0 tensorflow-addons 0.19.0 tensorflow-datensätze 4.8.1 tensorflow-schätzer 2.11.0 tensorflow-hub 0.12.0 tensorflow-io 0.29.0 tensorflow-io-gcs-filesystem 0.29.0 tensorflow-metadaten 1.12.0 tensorflow-model-optimization 0.7.3 tensorflow-text 2.11.0 termcolor 1.1.0 terminado 0.17.1 text-unidecode 1.3 tf-models-official 2.11.2 tf-slim 1.1.0 threadpoolctl 3.1.0 tinycss2 1.2.1 toml 0.10.2 tornado 6.2 tqdm 4.31.1 traitlets 5.8.1 typguard 2.13.3 typing-erweiterungen 4.4.0 unzip 1.0.0 uri-vorlage 1.2.0 uritemplate 4.1.1 urllib3 1.26.14 wcwidth 0.2.5 webcolors 1.12 webencodings 0.5.1 websocket-client 1.4.2 Werkzeug 2.2.2 rad 0.37.1 widgetsnbextension 4.0.5 wrapt 1.14.1 zipp 3.11.0 zstandard 0.19.0

hjonnala commented 1 year ago

tflite_convert --saved_model_dir=/home/dom/tensorflow/workspace/training_demo/exported-models/tfliteexport/saved_model --output_file=/home/dom/tensorflow/workspace/training_demo/exported-models/tfliteexport/saved_model/detect. tflite --input_shapes=1,300,300,3 --input_arrays=normalized_input_image_tensor --output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess: 2','TFLite_Detection_PostProcess:3' --inference_type=FLOAT --allow_custom_ops --mean_values=128 --std_dev_values=128 --change_concat_input_ranges=false --allow_nudging_weights_to_use_fast_gemm_kernel=true

Ops has to be either int8 or unit8. please try with inference_type=QUANTIZED_UINT8 or QUANTIZED_INT8

euDominic commented 1 year ago

tflite_convert --saved_model_dir=/home/dom/tensorflow/workspace/training_demo/exported-models/tfliteexport/saved_model --output_file=/home/dom/tensorflow/workspace/training_demo/exported-models/tfliteexport/saved_model/detect. tflite --input_shapes=1,300,300,3 --input_arrays=normalized_input_image_tensor --output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess: 2','TFLite_Detection_PostProcess:3' --inference_type=FLOAT --allow_custom_ops --mean_values=128 --std_dev_values=128 --change_concat_input_ranges=false --allow_nudging_weights_to_use_fast_gemm_kernel=true

Ops has to be either int8 or unit8. please try with inference_type=QUANTIZED_UINT8 or QUANTIZED_INT8

thx for your input, I give it a try. and the Output size: 10.97MiB ist ok? or is 8mb max for the tpu?

euDominic commented 1 year ago

unfortunately, that was not the solution either.

Neither QUANTIZED_INT8 nor QUANTIZED_UINT8 did any good.

I will probably have to go back to TF1

Edge TPU Compiler version 16.0.384591198 Started a compilation timeout timer of 180 seconds.

Model compiled successfully in 17 ms.

Input model: detect2.tflite Input size: 10.97MiB Output model: detect2_edgetpu.tflite Output size: 10.97MiB On-chip memory used for caching model parameters: 0.00B On-chip memory remaining for caching model parameters: 0.00B Off-chip memory used for streaming uncached model parameters: 0.00B Number of Edge TPU subgraphs: 0 Total number of operations: 157 Operation log: detect2_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs. Number of operations that will run on Edge TPU: 0 Number of operations that will run on CPU: 157

Operator Count Status

RESHAPE 10 Operation is working on an unsupported data type RESHAPE 4 Tensor has unsupported rank (up to 3 innermost dimensions mapped) CONCATENATION 2 Operation is working on an unsupported data type LOGISTIC 1 Operation is working on an unsupported data type PACK 4 Tensor has unsupported rank (up to 3 innermost dimensions mapped) ADD 12 Operation is working on an unsupported data type CUSTOM 1 Operation is working on an unsupported data type CONV_2D 72 Operation is working on an unsupported data type DEPTHWISE_CONV_2D 51 Operation is working on an unsupported data type Compilation child process completed within timeout period. Compilation succeeded!

hjonnala commented 1 year ago

and the Output size: 10.97MiB ist ok? or is 8mb max for the tpu?

It should be ok. There is no size limitation for TPU model. Howerver it is best to have as low as possible.

unfortunately, that was not the solution either.

Neither QUANTIZED_INT8 nor QUANTIZED_UINT8 did any good.

PLease try to convert with code snippet located at using saved_model: https://colab.sandbox.google.com/github/google-coral/tutorials/blob/master/retrain_classification_ptq_tf2.ipynb#scrollTo=w9ydAmHGHUZl

converter = tf.lite.TFLiteConverter.from_saved_model('path_to_saved_model')

alex-bronia commented 3 months ago

Was there any resolution found? I am facing similar issues.

google-coral / edgetpu

custom model with TF2 (2.11.0) for M.2 Edge TPU #708