High inference time after full integer post-training quantization compared to normal unquantized tflite model

devdastl commented 3 years ago

Hello everyone,

I am trying to perform full integer quantization on a pretrained model (ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/saved_model). I have followed this issue and was able to generate Google-Coral supported tflite model. Below is my python script which I used to perform quantization -

def representative_dataset_gen():
    folder = "images_test"
    image_size = 320
    raw_test_data = []

    files = glob.glob(folder+'/*.jpg')
    for file in files:
        image = Image.open(file)
        image = image.convert("RGB")
        image = image.resize((image_size, image_size))
        #Quantizing the image between -1,1;
        image = (2.0 / 255.0) * np.float32(image) - 1.0
        #image = np.asarray(image).astype(np.float32)
        image = image[np.newaxis,:,:,:]
        raw_test_data.append(image)

    for data in raw_test_data:
        yield [data]

converter = tf.lite.TFLiteConverter.from_saved_model('/home/tensorflow/models/research/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
#converter.experimental_new_converter = True
converter.inference_input_type = tf.uint8
#converter.inference_output_type = tf.uint8
#converter.allow_custom_ops = True

converter.experimental_new_converter = True
converter.experimental_new_quantizer = True

converter.representative_dataset = representative_dataset_gen
tflite_model = converter.convert()

with open('/home/tensorflow/models/research/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/edge/model_v4.tflite', "wb") as w:
    w.write(tflite_model)

But I am getting inference time in range 3500ms - 3600ms which seems a lot. For verification, I tried converting "saved_model" into tflite by using tflite_convert CLI without quantization as mentioned below: tflite_convert --saved_model_dir=ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/saved_model --output_file=ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/model.tflite Above mentioned unquantized tflite model was giving me inference time around 300ms.

Please can anyone help me to figure out how to improve inference time of quantized model. Also it would be very helpful if anyone can suggest which model to use for post-quantization and which for quant-aware training.

I am using tensorflow==2.5.0 pycoral for inferencing - https://github.com/google-coral/pycoral

manoj7410 commented 3 years ago

@devdastl Did you also compile the model with edgetpu compiler ? And on which device are you trying to run this model ?

devdastl commented 3 years ago

Hello @manoj7410,

Thanks for your replay, I have compiled my model using edgetpu_comiler and I am running this model on Google-Coral USB-accelerator. https://coral.ai/products/accelerator/ Below I have tried to show my work-flow: pretrained model ----> saved_model using export_tflite_tf2 -----> quantization using python script ----> compiled for edgetpu using edgetpu_compiler ------> inferencing. Let me know if any other information is required from my side.

Thanks

hjonnala commented 3 years ago

Hi, can you please share the ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/saved_model folder that you are using?

devdastl commented 3 years ago

Hello @hjonnala, Please find attachment of model tar file. Let me know if anything else is required from my side. ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz

Thanks

hjonnala commented 3 years ago

@devdastl I am unable to generate tflite with model tar file. Could you please share uncompiled tflite model. 37_pycoral.ipynb.tar.gz

hjonnala commented 2 years ago

Feel free to reopen if issue still exists and share the uncompiled tflite model.

google-coral / pycoral

High inference time after full integer post-training quantization compared to normal unquantized tflite model #37