Open Mary-prh opened 5 months ago
@HamadYA Your help is much appreciated
the problem is solved for quantizing to int8 with the following change: cloned_model = tf.keras.models.clone_model( ghost_model, clone_function=lambda layer: layer.class.from_config( {**layer.get_config(), "dtype": "float32"} ), ) However, the problem persists with tf.lite.OpsSet.EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8 as TFLite does not support PReLU
The previous solution I mentioned changes the weights remarkably and the prediction is no more valid
The previous solution I mentioned changes the weights remarkably and the prediction is no more valid
I am sorry, I cannot work on big issues right now. I am being busy with my PhD Research Dissertation. Sorry again.
Hey @Mary-prh did you find any suitable approach? I too was trying to quantize the model mainly because the inference time for each is too much and I wanted to deploy it to my software where I'll be inferencing on multiple images and the collective time increases alot.
My aim too is to decrease the 32bit floating to int8. Please drop down a solution if you've found one.
Environment
Issue Description
I am trying to deploy a TensorFlow Keras model using TensorFlow Lite for an edge device, aiming for model quantization to
int8
to optimize performance. However, after the quantization process, the data types of the input and output tensors remainfloat32
instead ofint8
as expected.Despite setting the inference_input_type and inference_output_type to tf.int8, the conversion result still shows float32 for both input and output tensor types.
Expected Behavior I expected the input and output tensor data types to be int8 after the quantization process.
Actual Behavior The input and output tensor data types remain float32 after quantization.
Steps to Reproduce Set up a TensorFlow environment with version 2.9 and Python 3.9.18. Execute the provided code snippet with a suitable model and dataset. Observe the data types of the input and output tensors after quantization. Has anyone encountered a similar issue, or does anyone have suggestions on how to ensure the tensors are correctly quantized to int8? Any insights or experiences with this problem would be greatly appreciated.
I have followed the standard process for converting and quantizing the model using TensorFlow Lite's
TFLiteConverter
API. Below is a simplified version of the code I used for the conversion and quantization process: