google-coral / edgetpu

Coral issue tracker (and legacy Edge TPU API source)
https://coral.ai
Apache License 2.0
422 stars 125 forks source link

Problem with post-training quantization on U-Net model trained with TF2 #322

Open tml444 opened 3 years ago

tml444 commented 3 years ago

Hello,

I am having problems with compiling a U-Net tensorflow 2.2 model for the coral tpu using post-training int8 quantization on windows 10 (wsl with Ubuntu 18.04 for compiler). According to the coral website, every operation should be supported. I applied int8 quantisation to the .pb model with the following code and tensorflow 2.5 (tf-nightly-gpu):

def representative_dataset():
  for _ in range(100):
    data = np.random.rand(1, 425, 775, 1])
    yield [data.astype(np.float32)]

converter = tf.lite.TFLiteConverter.from_saved_model('path/to/saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8, tf.lite.OpsSet.TFLITE_BUILTINS]
converter.target_spec.supported_types = [tf.int8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8 
converter.allow_custom_ops = True

During the quantization process, there is no additional information in the console output, it just writes the finished .tflite file after a while:

...
2021-02-25 13:28:37.664173: I tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 360416 microseconds.
model saved to: C:\path\to\unet_int8_tf25.tflite

The resulting tflite model is attached here: unet_tf25.zip

The compiler output with command edgetpu_compiler unet_int8_tf25.tflite is:

Edge TPU Compiler version 15.0.340273435
Invalid model: unet_int8_tf25.tflite
Model not quantized

Although the quantization ran without errors and the tflite model seems to be correctly quantized, the compiler does not recognize the model as being quantized.

Furthermore, I tried to use tensorflow 2.2 for post training quantization. When compiling the tensorflow 2.2 int8 quantized model, the compiler works and generates the following edgetpu model:

Edge TPU Compiler version 15.0.340273435
Input: unet_int8_tf22.tflite
Output: unet_int8_tf22_edgetpu.tflite

Operator                       Count      Status

LOGISTIC                       1          Mapped to Edge TPU
CONCATENATION                  4          Mapped to Edge TPU
PAD                            4          Mapped to Edge TPU
MAX_POOL_2D                    4          Mapped to Edge TPU
ADD                            13         Mapped to Edge TPU
CONV_2D                        10         Mapped to Edge TPU
TRANSPOSE_CONV                 4          Mapped to Edge TPU
QUANTIZE                       4          Mapped to Edge TPU
QUANTIZE                       1          Operation is otherwise supported, but not mapped due to some unspecified limitation
MUL                            9          Mapped to Edge TPU
DEQUANTIZE                     1          Operation is working on an unsupported data type

The resulting models can be found here: unet_tf22.zip

The inference on this model on the coral edge TPU results in very poor performance, just a little bit faster than running the model on CPU only. Also, it does not seem correctly quantized, as the model takes float32 as inputs.

I also quantized another model, based on AlexNet, which worked fine with the same quantization procedure and could be completely mapped to the TPU: alexnet_tf25.zip Since this worked out, I assumed that using tensorflow 2.5 for quantization should be fine.

I would appreciate any help in getting the U-Net model to be fully mapped to the TPU. Thank you. I can also provide more information if needed.

Versions: Windows 10 & WSL: Ubuntu 18.04 python 3.6 tf_nightly_gpu-2.5.0.dev20210223 / tensorflow-gpu-2.2.0 Edge TPU Compiler version 15.0.340273435 edgetpu_runtime_20210119

jk78346 commented 3 years ago

I face the same problem here. Also, @tml444 how do you know the model takes float32 as inputs on edgeTPU? thanks. [Edit] The same "Model not quantized" situation happens for tf2.3 and tf2.4. And I use the example model here. [Edit] Also could you provide your source code of alexnet? How do you make it work? thanks

jk78346 commented 3 years ago

Related issue but no recent update: https://github.com/google-coral/edgetpu/issues/168

converter.experimental_new_converter = False

Adding this flag converts my model with this message: (I'm using tf 2.4.1)

Edge TPU Compiler version 15.0.340273435

Model compiled successfully in 42 ms.

Input model: /tmp/mnist_tflite_models/mnist_model_quant_16x8.tflite
Input size: 81.91KiB
Output model: ./mnist_model_quant_16x8_edgetpu.tflite
Output size: 188.78KiB
On-chip memory used for caching model parameters: 3.00KiB
On-chip memory remaining for caching model parameters: 7.86MiB
Off-chip memory used for streaming uncached model parameters: 126.81KiB
Number of Edge TPU subgraphs: 1
Total number of operations: 5
Operation log: ./mnist_model_quant_16x8_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 3
Number of operations that will run on CPU: 2

Operator                       Count      Status

DEPTHWISE_CONV_2D              1          Mapped to Edge TPU
FULLY_CONNECTED                1          Mapped to Edge TPU
QUANTIZE                       1          Operation is otherwise supported, but not mapped due to some unspecified limitation
DEQUANTIZE                     1          Operation is working on an unsupported data type
RESHAPE                        1          Mapped to Edge TPU
tml444 commented 3 years ago

I face the same problem here. Also, @tml444 how do you know the model takes float32 as inputs on edgeTPU? thanks. [Edit] The same "Model not quantized" situation happens for tf2.3 and tf2.4. And I use the example model here. [Edit] Also could you provide your source code of alexnet? How do you make it work? thanks

Thanks for your reply. When I inspect th tf25 quantized unet model i see this: image

And for the tf22 "quantized" unet model: image

Because of this I have to feed float32 tensors when infering this model (with CPU for example). My succesfully quantized and compiled models always take uint8 inputs, as set in during tflite conversion.

Also note the runtime versions in model properties in the screenshots: The tf25 model uses version 2.3, whereas all my sucessfully compiled models show 1.x (e.g. 1.14 for tf22 unet model ...). I am not really sure to what this runtime version refers, as the AlexNet model quantized with tf25 shows 1.5 as runtime version.

@jk78346 Which runtime versions are used in your quantizations?

Unfortunately I am currently not able to provide the source code for the model since I was only provided with the saved model, but it may be possible to get my hands on the model code in the soon.

akosb commented 3 years ago

Also note the runtime versions in model properties in the screenshots: The tf25 model uses version 2.3, whereas all my sucessfully compiled models show 1.x (e.g. 1.14 for tf22 unet model ...). I am not really sure to what this runtime version refers, as the AlexNet model quantized with tf25 shows 1.5 as runtime version.

@tml444 - see below (imho Netron naming for the field is a bit miss-leading)

  // The minimum metadata parser version that can fully understand the fields in
  // the metadata flatbuffer. The version is effectively the largest version
  // number among the versions of all the fields populated and the smallest
  // compatible version indicated by the file identifier.
  //
  // This field is automaticaly populated by the MetadataPopulator when
  // the metadata is populated into a TFLite model.
  min_parser_version:string;

(ref: tflite-support)