google-coral / edgetpu

Coral issue tracker (and legacy Edge TPU API source)
https://coral.ai
Apache License 2.0
417 stars 125 forks source link

'edgetpu_compiler' does not work (Model not quantized) #168

Closed kimwj94 closed 4 years ago

kimwj94 commented 4 years ago

I tried to use edgetpu_compiler to convert my quantized model. (with tensorflow 2.4.0) I followed sample code Retrain a classification model for Edge TPU using post-training quantization (with TF2) and Post-training integer quantization

I quantized following model:

model = tf.keras.Sequential([
  tf.keras.layers.InputLayer(input_shape=(28, 28)),
  tf.keras.layers.Reshape(target_shape=(28, 28, 1)),
  tf.keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
  tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(10)
])

But, when I tried command line edgetpu_compiler my_mnist_quant.tflite I got following message.


Edge TPU Compiler version 14.1.317412892
Invalid model: my_mnist_quant.tflite
Model not quantized

However, when I commented two layers Conv2D and MaxPooling2D, it worked well

model = tf.keras.Sequential([
  tf.keras.layers.InputLayer(input_shape=(28, 28)),
  tf.keras.layers.Reshape(target_shape=(28, 28, 1)),
  #tf.keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),      
  #tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),                                        
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(10)
])

Edge TPU Compiler version 14.1.317412892

Model compiled successfully in 44 ms.

Input model: my_mnist_quant.tflite
Input size: 9.39KiB
Output model: my_mnist_quant_edgetpu.tflite
Output size: 56.50KiB
On-chip memory used for caching model parameters: 49.25KiB
On-chip memory remaining for caching model parameters: 7.81MiB
Off-chip memory used for streaming uncached model parameters: 0.00B
Number of Edge TPU subgraphs: 1
Total number of operations: 4
Operation log: my_mnist_quant_edgetpu.log
See the operation log file for individual operation details.

I don't know why it is not working with Conv2D and MaxPooling2D. (Actually, it is also not working when only one of them are commented)

My full code is below (tensorflow 2.4.0):

! pip uninstall -y tensorflow
! pip install tf-nightly
import tensorflow as tf
import os
import numpy as np
import matplotlib.pyplot as plt

# Load MNIST dataset
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0

# Define the model architecture.
model = tf.keras.Sequential([
  tf.keras.layers.InputLayer(input_shape=(28, 28)),
  tf.keras.layers.Reshape(target_shape=(28, 28, 1)),
  tf.keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
  tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(
  train_images,
  train_labels,
  epochs=1,
  validation_data=(test_images, test_labels)
)

def representative_data_gen():
    mnist_train, _ = tf.keras.datasets.mnist.load_data()
    images = tf.cast(mnist_train[0], tf.float32) / 255.0
    mnist_ds = tf.data.Dataset.from_tensor_slices((images)).batch(1)
    for input_value in mnist_ds.take(100):
    # Model has only one input so each data point has one element.
        yield [input_value]

converter = tf.lite.TFLiteConverter.from_keras_model(model)
# This enables quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.int8]
# This ensures that if any ops can't be quantized, the converter throws an error
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# These set the input and output tensors to uint8 (added in r2.3)
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
# And this sets the representative dataset so we can quantize the activations
converter.representative_dataset = representative_data_gen
tflite_model = converter.convert()

with open('my_mnist_quant.tflite', 'wb') as f:
    f.write(tflite_model)

! edgetpu_compiler my_mnist_quant.tflite

Namburger commented 4 years ago

@kimwj94 could you downgrade to tf2.2 for now?

With tf-nightly or tf2.4, please turn off MLIR:

converter.experimental_new_converter = False

This is working for me:

~/quantization/ptq » edgetpu_compiler -s my_mnist_quant.tflite
Edge TPU Compiler version 14.1.317412892

Model compiled successfully in 60 ms.

Input model: my_mnist_quant.tflite
Input size: 22.81KiB
Output model: my_mnist_quant_edgetpu.tflite
Output size: 96.52KiB
On-chip memory used for caching model parameters: 134.00KiB
On-chip memory remaining for caching model parameters: 7.73MiB
Off-chip memory used for streaming uncached model parameters: 0.00B
Number of Edge TPU subgraphs: 1
Total number of operations: 6
Operation log: my_mnist_quant_edgetpu.log

Operator                       Count      Status

FULLY_CONNECTED                1          Mapped to Edge TPU
QUANTIZE                       2          Mapped to Edge TPU
RESHAPE                        1          Mapped to Edge TPU
MAX_POOL_2D                    1          Mapped to Edge TPU
DEPTHWISE_CONV_2D              1          Mapped to Edge TPU
kimwj94 commented 4 years ago

@Namburger Thank you! it works.

InCogNiTo124 commented 4 years ago

Hello, I also get Model not quantized when trying to quantize a SavedModel, but using experimental_new_converter = False is causing me an Aborted (core dumped):

2020-07-28 13:56:44.148771: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-07-28 13:56:44.148797: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-07-28 13:56:46.174485: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-07-28 13:56:46.174506: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2020-07-28 13:56:46.174524: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
2020-07-28 13:56:46.174727: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-07-28 13:56:46.198563: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3692555000 Hz
2020-07-28 13:56:46.199123: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1238d320 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-28 13:56:46.199142: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-07-28 13:56:46.229969: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:118] None of the MLIR optimization passes are enabled (registered 1)
WARNING:absl:Please consider switching to the new converter by setting experimental_new_converter=True. The old converter (TOCO) is deprecated.
Traceback (most recent call last):
  File "./converter.py", line 50, in <module>
    tflite_model = converter.convert()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/lite.py", line 752, in convert
    output_tensors)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/lite.py", line 669, in convert
    **converter_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/convert.py", line 574, in toco_convert_impl
    enable_mlir_converter=enable_mlir_converter)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/convert.py", line 279, in toco_convert_protos
    raise ConverterError("See console for info.\n%s\n%s\n" % (stdout, stderr))
tensorflow.lite.python.convert.ConverterError: See console for info.
2020-07-28 13:56:48.602468: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-07-28 13:56:48.602490: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-07-28 13:56:49.827525: F tensorflow/lite/toco/import_tensorflow.cc:2707] Check failed: !absl::EndsWith(specified_input_array.name(), ":0") Unsupported explicit zero output index: input:0
Fatal Python error: Aborted

Current thread 0x00007f30c55c6740 (most recent call first):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/toco/python/toco_from_protos.py", line 56 in execute
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250 in _run_main
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299 in run
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40 in run
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/toco/python/toco_from_protos.py", line 93 in main
  File "/usr/local/bin/toco_from_protos", line 8 in <module>
Aborted (core dumped)

Any idea what could go wrong here? Could it be that the problem is in the model itself?

For completeness, my converter script is pretty basic:

#!/usr/bin/env python3
import tensorflow as tf
import os
import sys
import cv2
import itertools as it
import numpy as np
import pathlib

if len(sys.argv) == 1:
    print("./converter_tf2.py ${PWD}/model_dir output_filename: default 'model.tflite' file1 file2")
    sys.exit(1)
#print(list(zip(range(len(sys.argv)), sys.argv)))

def get_gen_from_file(txt_file):
    with open(txt_file, "r") as f:
        image_paths = [line.strip() for line in f.readlines()]

    for image_path in image_paths:
        image = image = cv2.imread(image_path)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        image = cv2.resize(image, (160, 160))
        image = np.expand_dims(image, 0)
        image = image.astype(np.float32)
        image -= image.mean()
        image /= image.std()
        yield image

def dataset_gen(unmasked_txt, masked_txt):
    def f():
        i = 0
        for t in it.chain(get_gen_from_file(unmasked_txt), get_gen_from_file(masked_txt)):
            if t is not None:
                print("Loaded image", i)
                i += 1
                yield [t]
    return f

model_dir = sys.argv[1]

converter = tf.lite.TFLiteConverter.from_saved_model(model_dir)
converter.experimental_new_converter = False
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.target_spec.supported_types = [tf.int8]
converter.inference_input_type = tf.int8  # or tf.int8
converter.inference_output_type = tf.int8  # or tf.int8
converter.representative_dataset = dataset_gen(sys.argv[3], sys.argv[4])
tflite_model = converter.convert()
tflite_name = pathlib.Path(os.getcwd())
tflite_name = tflite_name / ('model.tflite' if len(sys.argv) < 3 else sys.argv[2])
tflite_name.write_bytes(tflite_model)
Namburger commented 4 years ago

@InCogNiTo124 It looks like a converter bug, if it's getting a core dump. Unfortunately that's out of our hand :( This would be the best place to report this bug

gan3sh500 commented 3 years ago

@Namburger is the only solution in such a situation to wait for next release of edgetpu_compiler? tf 2.2 does not properly allow int8 quantization, input and output are still fp32. 2.3 has fixed that and 2.4 nightlies allow converting more models.

Namburger commented 3 years ago

@gan3sh500 can you try turning off MLIR converter with tf2.3 and see if it works?

gan3sh500 commented 3 years ago

@Namburger I tried but TOCO was failing and hard to understand the source of the error. Do you know when edgetpu_compiler with support for newer tf will be out?

Namburger commented 3 years ago

Sorry, we're still working on supports for MLIR so there isn't a committed date for full compiler compatibility yet :/

msokoloff1 commented 3 years ago

Any update on MLIR support?

Namburger commented 3 years ago

@msokoloff1 should works now, are you still having issue?

msokoloff1 commented 3 years ago

Actually, it looks like I am having an issue with tf 2.4.0. I want to convert a model trained with QAT. I have only been able to run full integer quantization on a QAT model in tf 2.4.0. But then when I go to compile for the edge tpu it I am getting the same issue as above.

Edge TPU Compiler version 15.0.340273435 Invalid model: model_out.tflite Model not quantized

ugtony commented 3 years ago

I installed TF-nightly(2.5.0-dev20201227) to train a model with QAT, and used the following configure to convert the model to tflite

converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_tflite_model = converter.convert()
converter.experimental_new_converter = False
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8

I got some warning while converting. The message showed MLIR was turned off.

2020-12-30 02:22:00.056289: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:933] Optimization results for grappler item: graph_to_optimize
function_optimizer: function_optimizer did nothing. time = 0.065ms.
function_optimizer: function_optimizer did nothing. time = 0.001ms.

2020-12-30 02:22:01.718892: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:332] Ignored output_format.
2020-12-30 02:22:01.718972: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:335] Ignored drop_control_dependency.
2020-12-30 02:22:01.790998: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:194] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.`

But I still got the error while compiling

root@tf2:~/git/train_face# edgetpu_compiler -s models/20201228_quantized-efficientnet-lite2.tflite
Edge TPU Compiler version 15.0.340273435
Invalid model: models/20201228_quantized-efficientnet-lite2.tflite
Model not quantized