Closed travisjayday closed 2 years ago
Please update to the latest commit and apply the following configuration to the quantizer.
quantizer = PostQuantizer(model, dummy_input,
work_dir='ptq', config={'asymmetric': True, 'per_tensor': False, 'set_quantizable_op_stats': True})
Do you know why this is and if it can be achieved easily?
Because in PyTorch, quantization for log_softmax is not supported. So we have implemented the rewrite_quantizable
pass in the TFLiteConverter for rewriting those floating kernels to quantized kernels. But for log_softmax, it has to be used together with the usage of set_quantizable_op_stats
in the PostQuantizer. Even with that option, it is not enough because log_softmax
is the last operation in your computation graph. While rewriting the model, it becomes ...-log_softmax-dequantize-quantize
and then the graph optimizer just removes the consecutive dequantize
and quantize
nodes, which makes it impossible for the TFLiteConverter to restore it back to quantized kernel because the q-params is lost. So I pushed a new commit to fix this issue.
Hi peter! Thank you for your very quick response and commit. The new commit does seem to fix the quantization issue in the graph!!
However, I found some weird behavior which might be a Tensorflow Lite issue and not a TinyNeuralNetwork issue.
If we use
interpreter = tf.lite.Interpreter(model_path=tflite,
experimental_op_resolver_type=tf.lite.experimental.OpResolverType.BUILTIN_REF)
The output of the softmax is always -128
.
In any case, using the default op_resolver_type
works as expected now! Once the model gets deployed to MCU I'll come back and comment on whether it worked or not. For now, this issue can be closed I think.
Thanks again!!
The output of the softmax is always
-128
.
You can use
interpreter = tf.lite.Interpreter(..., experimental_preserve_all_tensors=True)
input_details = interpreter.get_input_details()
tensor_details = interpreter.get_tensor_details()
dummy_input = np.random.random(size=(1, 224, 224, 3)).astype('float32')
interpreter.set_tensor(input_details[0]['index'], dummy_input)
interpreter.invoke()
for i in range(len(tensor_details)):
print(i, tensor_details[i]['name'], interpreter.get_tensor(tensor_details[i]['index'])
to find out the layer which is not correctly implemented and report the issue to Tensorflow Lite.
Hi there, LOG_SOFTMAX isn't being quantized to INT8. The converter adds a dequantize layer before the LOG_SOFTMAX node. This is not the behavior when using the regular converter from tensorflow (their converter quantizes the LOG_SOFTMAX so that the model's output is INT8). Do you know why this is and if it can be achieved easily? Maybe I'm just missing something but I looked through the source and can't find any extra arguments or config options.
Here's a minimal example: