PINTO0309 / tflite2tensorflow

Generate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite, ONNX, OpenVINO, Myriad Inference Engine blob and .pb from .tflite. Support for building environments with Docker. It is possible to directly access the host PC GUI and the camera to verify the operation. NVIDIA GPU (dGPU) support. Intel iHD GPU (iGPU) support. Supports inverse quantization of INT8 quantization model.
https://qiita.com/PINTO
MIT License
258 stars 38 forks source link

model quantified slower than not quantifed on windows x64 #38

Closed imjking closed 11 months ago

imjking commented 11 months ago

Issue Type

Performance

OS

Windows

OS architecture

Other

Programming Language

Python

Framework

TensorFlow, TensorFlowLite

Download URL for tflite file

https://storage.googleapis.com/mediapipe-assets/face_detection_short_range.tflite https://storage.googleapis.com/mediapipe-assets/face_detection_full_range.tflite

Convert Script

tflite2tensorflow --model_path face_detection_short_range.tflite --flatc_path ../flatc --schema_path ../schema.fbs --output_pb

tflite2tensorflow --model_path face_detection_short_range.tflite --flatc_path ../flatc --schema_path ../schema.fbs --output_no_quant_float32_tflite --output_dynamic_range_quant_tflite --output_weight_quant_tflite --output_float16_quant_tflite --output_integer_quant_tflite

Description

hello, as the title say. i convert successfully the model, but the quant model is slower than original float32 model, and integer quant is slower than weight quant on windows x64. Do you know the reason? looking forward to your replay.

Relevant Log Output

inference time:
original float32  15ms
weight quant 150
integer quant 200

Source code for simple inference testing code

No response

PINTO0309 commented 11 months ago

It's too much trouble to answer in detail, so I'll just tell you the conclusion.

It is normal for them to slow down. Explore the Internet and GitHub issues to find the answer. I do not want to give the same answer over and over again.