google-coral / edgetpu

Coral issue tracker (and legacy Edge TPU API source)
https://coral.ai
Apache License 2.0
417 stars 125 forks source link

Edge TPU Compiler issue for TensorFlow 2.3.0 TFLite Quantization #222

Closed goodwilj closed 3 years ago

goodwilj commented 3 years ago

I'm attempting to compile a variation of the DenseNet model for the Edge TPU, and I'm having similar problems to the issue: https://github.com/google-coral/edgetpu/issues/64. DenseNet seems to be supported by the CoralTPU as it is listed in the benchmarks: https://coral.ai/docs/edgetpu/benchmarks/.

Unfortunately, for DenseNet, the transpose convolution is not supported for integer quantization in TensorFlow 1.15.0, so I cannot convert a compatible TFLite model. However, TensorFlow 2.3.0 now supports integer quantization for transpose convolution and allows for int8/uint8 inputs and outputs. I have successfully converted the DenseNet model to a quantized TFLite model in TensorFlow 2.3.0 (attached below) and visualized the TFLite graph to ensure it is quantized. However, when I run the Edge TPU compiler, it outputs:

Edge TPU Compiler version 14.1.317412892 Invalid model: keras_model_2-3-0.tflite Model not quantized

Why is the Edge TPU Compiler saying the model is not quantized (I have checked that it is int8 in the visualization)?

Versions: TensorFlow 2.3.0 Edge TPU Compiler 14.1.317412892 keras_model_2-3-0.zip

Note: I've also attempted to use TensorFlow 2.2.0 to convert my model to TFLite (attached below) as was performed in the linked issue above. The Edge TPU compiler fails to compile for -m 12 and -m 13, citing an Internal compiler error. Aborting!. However, for -m 10 and -m 11 , the Edge TPU compiler compiles, but none of the operations is mapped to the TPU: 'Edge TPU Compiler version 14.1.317412892

Model compiled successfully in 142 ms.

Input model: keras_model_2-2-0.tflite Input size: 4.92MiB Output model: keras_model_2-2-0_edgetpu.tflite Output size: 4.88MiB On-chip memory used for caching model parameters: 0.00B On-chip memory remaining for caching model parameters: 0.00B Off-chip memory used for streaming uncached model parameters: 0.00B Number of Edge TPU subgraphs: 0 Total number of operations: 401 Operation log: keras_model_2-2-0_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs. Number of operations that will run on Edge TPU: 0 Number of operations that will run on CPU: 401

Operator Count Status

TRANSPOSE_CONV 4 Operation is working on an unsupported data type CONV_2D 70 Operation is working on an unsupported data type CONCATENATION 68 Operation is working on an unsupported data type ADD 72 Operation is working on an unsupported data type MAX_POOL_2D 4 Operation is working on an unsupported data type DEQUANTIZE 1 Operation is working on an unsupported data type QUANTIZE 114 Operation is otherwise supported, but not mapped due to some unspecified limitation MUL 68 Operation is working on an unsupported data type`

Versions: TensorFlow 2.2.0 Edge TPU Compiler version 14.1.317412892

Why is there an internal compiler error for certain versions of the runtime, and why are no int8-quantized operations mapped to the TPU when they should be supported according to the documentation?

keras_model_2-2-0.zip

Namburger commented 3 years ago

Hello! Could you also paste the snippet of the post training quantization code?

goodwilj commented 3 years ago

Sure, here is the post-training quantization code:

`import tensorflow as tf import numpy as np from tensorflow import keras

keras_model = tf.keras.models.load_model("keras_model.h5") converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)

input_arrays=["x"], output_arrays=["Identity"],input_shapes={'x':[1,32,32,1]})

def representative_datasetgen(): for in range(10): input_array = np.random.random((1,32,32,1)) input_array = np.array(input_array,dtype=np.float32) yield [input_array]

converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = representative_dataset_gen converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] converter.inference_input_type = tf.uint8 converter.inference_output_type = tf.uint8 tflite_model = converter.convert()

with open('keras_model_2-3-0.tflite', 'wb') as f: f.write(tflite_model)`

I've simply used garbage random data for quantization for now

Namburger commented 3 years ago

@goodwilj

I've simply used garbage random data for quantization for now

All good :)

Anyhow, there is this new thing with tf2.3 that took us a while to figure out. But the batchsize need to be set explicitly now or it won't be static, so can you add this line:

keras_model.input.set_shape((1,) + model.input.shape[1:]) # this line
converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)

We are working on documenting this!

Namburger commented 3 years ago

Wait, we actually have a colab for it: https://colab.research.google.com/github/google-coral/tutorials/blob/master/fix_conversion_issues_ptq_tf2.ipynb

goodwilj commented 3 years ago

Thanks! It now compiles for -m 10 and -m 11 runtimes but not 12 and 13, like I mentioned for TF 2.2.0 in the Note section of the original post.

For the 10 and 11 runtimes, none of the operations is actually mapped to the edge TPU:

`Edge TPU Compiler version 14.1.317412892

Model compiled successfully in 98 ms.

Input model: keras_model_2-3-0.tflite Input size: 4.92MiB Output model: keras_model_2-3-0_edgetpu.tflite Output size: 4.88MiB On-chip memory used for caching model parameters: 0.00B On-chip memory remaining for caching model parameters: 0.00B Off-chip memory used for streaming uncached model parameters: 0.00B Number of Edge TPU subgraphs: 0 Total number of operations: 400 Operation log: keras_model_2-3-0_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs. Number of operations that will run on Edge TPU: 0 Number of operations that will run on CPU: 400

Operator Count Status

CONV_2D 70 Operation is working on an unsupported data type TRANSPOSE_CONV 4 Operation is working on an unsupported data type QUANTIZE 114 Operation is otherwise supported, but not mapped due to some unspecified limitation MUL 68 Operation is working on an unsupported data type MAX_POOL_2D 4 Operation is working on an unsupported data type ADD 72 Operation is working on an unsupported data type CONCATENATION 68 Operation is working on an unsupported data type`

I've reattached the updated model with the batch size fix below: keras_model_2-3-0_update.zip

Namburger commented 3 years ago

Humnn, are you using tf2.3, could you also attach the keras .h5 model?

goodwilj commented 3 years ago

Yes, I'm using TF 2.3.0. And sure, here is the keras model:

keras_model.zip

Namburger commented 3 years ago

I see, after trying to perform post training quantization on the model you provide, here is my finding:

The compiler rejects the model on purpose because there are some mis matching in quantization parameter which could cause bad prediction: Concat The are 2 quantized op going to that same Concat layer where one has this:

scale: 0.048531219363212585 zero_point: 103 num_fxp_values: 256

and the other one:

scale: 0.033884394913911819 zero_point: 120 num_fxp_values: 256

This seems to me like a conversion issue, I suggest reaching out to the tensorflow team for a more appropriate solution

carllhsiung commented 3 years ago

Hi @Namburger ,

I can't compile INT8 TFLite model that contains Conv2DTranspose or UpSampling2D with 2.4.0-rc4 or tf-nightly, but it works with 2.2.0.

System information

Command used to run the converter or code

Please check:

2.4.0rc4 Conv2DTranspose gist

tf-nightly Conv2DTranspose gist

tf-nightly UpSampling2D gist

The output from the edgetpu_compiler

Edge TPU Compiler version 15.0.340273435
Invalid model: output.tflite
Model not quantized

The expected result:

2.2.0 gist

Might be similar to here

I will use 2.2.0 anyway.

Thanks in advance.