why the input and output of a Quantize Ops inside a tflite model are not equal?

fjzhangcr commented 2 years ago

Description

hi, i am checking inside a tflite model which give me fault result. The first wired thing i discover was the behavior of Quantize Ops. I have make a toy model and make sure the Quantize Ops not change the value but just modify the Quantization. But in my a big model, when i use interpreter.get_tensor() to get the Quantize Ops's input and output, i found they are not equal. my keras model yolov4_rand.h5 in h5 format is here yolov4_rand.h5 the model is loaded in to memeory as keras model and the dimention is forced to 1:

import tensorflow as tf
import numpy as np
from datetime import datetime

model_filename='./yolov4_rand.h5'
model= tf.keras.models.load_model(model_filename)
model.input.set_shape((1,) + model.input.shape[1:])

representative_dataset is ranging from 0 to 1, the converter.convert() code are here

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset(): # between 0 and +1
    for i in range(200):
        data = np.random.rand(1, 512, 512, 3)
        print('representative_dataset:',data.min().round(4),data.max().round(4))
        yield [data.astype(np.float32)]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_types = [tf.int8]
converter.target_spec.supported_ops = [
    tf.lite.OpsSet.TFLITE_BUILTINS_INT8,
    # tf.lite.OpsSet.TFLITE_BUILTINS,
    # tf.lite.OpsSet.SELECT_TF_OPS,
    ]
converter.experimental_new_converter = True
converter.inference_input_type = tf.uint8 #output remains float32
print('start  converter.convert() !')
convert_t1=datetime.now()
print('converter.convert() starts：',str(convert_t1))
tflite_model = converter.convert()
convert_t2=datetime.now()
print('converter.convert() ends:',str(convert_t2))
print('converter.convert() cost：',(convert_t2-convert_t1).seconds)

tflite_model_filename='./yolov4_rand_wired_0P1_QuanOps.tflite'
with open(tflite_model_filename, 'wb') as f:
    f.write(tflite_model)

the tflite file took me 1000s until it is generated. the tflite file accepts 0 to 255 inputs in UINT8 format, yolov4_rand_wired_0P1_QuanOps.tflite is here yolov4_rand_wired_0P1_QuanOps.tflite

the netron of the generated tflite file is here

check the input output details

interpreter = tf.lite.Interpreter(
    model_path=tflite_model_filename,num_threads=8)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print(input_details)
print(output_details)
input_shape = input_details[0]['shape']
input_dtype = input_details[0]['dtype']
print("input_shape is ",input_shape,input_dtype)
for i, output_detail in enumerate(output_details):
    output_shape = output_detail['shape']
    output_dtype = output_detail['dtype']
    print("No {} output_shape is {}, type is {}.".format(
        i,output_shape,output_dtype))

print outputs:

[{'name': 'serving_default_input_1:0', 'index': 0, 'shape': array([  1, 512, 512,   3]), 'shape_signature': array([  1, 512, 512,   3]), 'dtype': <class 'numpy.uint8'>, 'quantization': (0.003921568859368563, 0), 'quantization_parameters': {'scales': array([0.00392157], dtype=float32), 'zero_points': array([0]), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]
[{'name': 'StatefulPartitionedCall:1', 'index': 832, 'shape': array([    1, 16128,    80]), 'shape_signature': array([    1, 16128,    80]), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}, {'name': 'StatefulPartitionedCall:0', 'index': 826, 'shape': array([    1, 16128,     4]), 'shape_signature': array([    1, 16128,     4]), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]
input_shape is  [  1 512 512   3] <class 'numpy.uint8'>
No 0 output_shape is [    1 16128    80], type is <class 'numpy.float32'>.
No 1 output_shape is [    1 16128     4], type is <class 'numpy.float32'>.

i feed the tflite model with data of tf.random.uniform([1,512,512,3],0,255,tf.int32), then cast it to uint8 since uint8 input of [0,255] is identical with float input of [0,1], that means 0.5 in float32 is identical with 128 in UINT8 i record the float input as check_a0

image_batch_uint8=tf.cast(
    tf.random.uniform([1,512,512,3],0,255,tf.int32),
    tf.uint8)
image_batch_float=tf.cast(image_batch_uint8,tf.float32)/255.
check_a0=image_batch_float[0].numpy()

then wait 100s until interpreter invoke() done

interpreter.set_tensor(input_details[0]['index'], image_batch_uint8)
print("feeding tflite with input_shape is ",image_batch_uint8.shape,
      ', input dtype is',image_batch_uint8.dtype)

print('start  interpreter.invoke() !')
invoke_t1=datetime.now()
print('interpreter.invoke() starts：',str(invoke_t1))
interpreter.invoke()
invoke_t2=datetime.now()
print('interpreter.invoke() ends：',str(invoke_t2))
print('interpreter.invoke() cost：',(invoke_t2-invoke_t1).seconds)

then record input of Quantize Ops as check_a1


tensor_name='serving_default_input_1:0'
tensor_INT=interpreter.get_tensor(0)
tensor_float=tf.cast(tensor_INT,tf.float32)
scale=0.003921568859368563;zero_point=0
tensor_tflite_FLOAT=scale*(tensor_float-zero_point)
check_a1=tensor_tflite_FLOAT.numpy()[0]

i get the Quantize Ops output as check_a2

tensor_name='tfl.quantize'
tensor_INT=interpreter.get_tensor(276)
tensor_float=tf.cast(tensor_INT,tf.float32)
scale=0.003921568859368563;zero_point=-128
tensor_tflite_FLOAT=scale*(tensor_float-zero_point)
check_a2=tensor_tflite_FLOAT.numpy()[0]

the input of Quantize operation(check_a1) EQUALS the real float inputs(check_a0), but the output of Quantize Ops.(check_a2) did not equal with them

is it a bug, or it is normal? Did I 1, mis-understand the behavior of Quantize? or 2, do the converter.convert() wrong? or 3, get_tensor in a wrong way?

thanks~~

Click to expand!

### Issue Type Bug ### Operating System Windows 10 ### Coral Device Dev Board ### Other Devices _No response_ ### Programming Language Python 3.7 ### Relevant Log Output _No response_

fjzhangcr commented 2 years ago

in my concept, the Quantize Op is just for data type transformer. Since Quantize Op is just a bias shifter to re-range the [0,255] to [-128,127], the interpreter.get_tensor() from inputs and outpus of Quantize Op, should be a constant of 128. but this Quantize Op acts abnormal

a0=image_batch_uint8[0].numpy()
a1=interpreter.get_tensor(0)[0]
a2=interpreter.get_tensor(276)[0]

hjonnala commented 2 years ago

It is because by default TFLite doesn't preserve intermediate tensors this is because it optimizes memory usage and reuse allocated memory of a tensor based on the data flow dependency. You can use the newly added debugging feature to preserve all tensors.

interpreter = tflite.Interpreter(model_path=model_path, experimental_preserve_all_tensors=True) https://stackoverflow.com/questions/53109990/tflite-get-tensor-on-non-output-tensors-gives-random-values

hjonnala commented 2 years ago

Feel free to reach out to tflite team for any additional questions on this one: https://github.com/tensorflow/tflite-support

google-coral-bot[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No

fjzhangcr commented 2 years ago

thank you

google-coral-bot[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No

google-coral / edgetpu

why the input and output of a Quantize Ops inside a tflite model are not equal? #607

Description