alibaba / TinyNeuralNetwork

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
MIT License
761 stars 116 forks source link

A PTQ tflite model fails to pass benchmark test #95

Open liamsun2019 opened 2 years ago

liamsun2019 commented 2 years ago

My use case: Apply post training quantization to a pth model and convert to tflite. The generated tflite model fails to pass benchmark test with following error message: STARTING! Log parameter values verbosely: [0] Graph: [out/ptq_model.tflite] Loaded model out/ptq_model.tflite ERROR: tensorflow/lite/kernels/concatenation.cc:179 t->params.scale != output->params.scale (3 != -657359264) ERROR: Node number 154 (CONCATENATION) failed to prepare. Failed to allocate tensors! Benchmarking failed.

Pls refer to the attachment. Thanks. test.zip

liamsun2019 commented 2 years ago

My quantization strategy: quantizer = PostQuantizer(model, dummy_input, work_dir='out', config={'force_overwrite': True, 'rewrite_graph': True, 'is_input_quantized': None, 'asymmetric': False, 'per_tensor': False}) 。。。。。。。。。。。。。。。。。。。。。。。。。。 converter = TFLiteConverter(ptq_model, dummy_input, tflite_path='out/ptq_model.tflite', strict_symmetric_check=False, quantize_target_type='int8')

liamsun2019 commented 2 years ago

The following strategy works: quantizer = PostQuantizer(model, dummy_input, work_dir='out', config={'force_overwrite': True, 'rewrite_graph': True, 'is_input_quantized': None, 'asymmetric': True, 'per_tensor': True}) 。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。 converter = TFLiteConverter(ptq_model, dummy_input, tflite_path='out/ptq_model.tflite', strict_symmetric_check=False, quantize_target_type='uint8')

liamsun2019 commented 2 years ago

Looks int8 per-channel quantization may incur errors.

peterjc123 commented 2 years ago

The following pattern in your model is the root cause of the problem.

A = sigmoid(X)
B = cat(A, Y)

The output tensor of the sigmoid op has a constant quantization parameter. There are several ways to get it fixed.

  1. Unify the quantization parameters of (Y, B) to the quantization parameters of A and also we need to disable the observers in those variables.
  2. Insert requantization after A, so that we have
    A = sigmoid(X)
    A_ = requantize(A)
    B = cat(A_, Y)

    Then, we may just unify the quantization parameters of (A_, Y, B), just like what we do as usual.

peterjc123 commented 2 years ago

Or you may just skip the quantization for this kind of pattern, which seems to be the simplest solution.

peterjc123 commented 1 year ago
  1. Unify the quantization parameters of (Y, B) to the quantization parameters of A and also we need to disable the observers in those variables.

This is simpler I guess. We will try to fix it this way.