Edge TPU compiler fails

gsirocco commented 1 year ago

Description

I have a saved TF model that I load and convert to TFLite and save. When I use the edge tpu compiler it just fails. Tensorflow version is 2.11.0

$ edgetpu_compiler -s bitintmodel.tflite                                             
Edge TPU Compiler version 16.0.384591198
Started a compilation timeout timer of 180 seconds.
Compilation child process completed within timeout period.
Compilation failed!

Here is how I do my conversion. I know the TFLite is actually functionally correct in this instance. I have attached the TF saved model as a zip file. Thanks!!

import tensorflow as tf
from tensorflow import keras
import numpy as np
import sys

def representative_dataset():
  for _ in range(100):
      data = np.random.randint(0, 2, (16200,))
      yield [data.astype(np.float32)]

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.target_spec.supported_types = [tf.int8]
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8

tflite_quant_model = converter.convert()

fileName = 'bitintmodel.tflite'
with open(fileName, 'wb') as f:
     f.write(tflite_quant_model)

bit_interleaver.zip

Click to expand!

### Issue Type Bug ### Operating System Linux ### Coral Device M.2 Accelerator A+E ### Other Devices _No response_ ### Programming Language Python 3.9 ### Relevant Log Output _No response_

gsirocco commented 1 year ago

I forgot to attach the output tflite model from conversion that is failing as input to the tpu compiler, here it is zipped. Thanks! bitintmodel.zip

hjonnala commented 1 year ago

can you please share the code to generate the saved model as well.. Thanks!

gsirocco commented 1 year ago

Sure here is the code to generate the model. Thanks!

interleaver_input = keras.Input(shape=(16200,), name="interleaver_input")  # This is the LDPC codeword that is the input to the interleaver

# interleaver_input is now a Keras layer, so we shall turn it into a TensorFlow tensor before slicing it:
intermediate_input = tf.reshape(interleaver_input, shape=(16200,))
info_bits = intermediate_input[:-1800]
parity_bits = intermediate_input[-1800:]

# Interleave the parity bits only by writing them into a 5 x 360 parity interleaver matrix columnwise, then reading out rowwise:
transposed_parity_interleaver_columns = tf.reshape(parity_bits, shape=(360, 5))  # tf.reshape is row-major
parity_interleaver_columns = tf.transpose(transposed_parity_interleaver_columns)  # 5 x 360 as desired
interleaved_parity_bits = tf.reshape(parity_interleaver_columns, shape=(1800,))  

interleaver_input_after_parity_interleaving = tf.concat([info_bits, interleaved_parity_bits], 0)  # along axis 0

# Write these bits into the interleaver column by column:
transposed_interleaver_columns = tf.reshape(interleaver_input_after_parity_interleaving, shape=(8, 2025))  
interleaver_columns = tf.transpose(transposed_interleaver_columns)  # 2025 x 8 as required by DVB-C2

# Next, apply the column twists as specified for rate 8/9 QAM256 length-16200 in DVB-C2 Table 7a:
interleaver_columns_twisted = tf.stack(
    [
        interleaver_columns[:, 0],
        interleaver_columns[:, 1],
        interleaver_columns[:, 2],
        tf.roll(interleaver_columns[:, 3], shift=1, axis=0),
        tf.roll(interleaver_columns[:, 4], shift=7, axis=0),
        tf.roll(interleaver_columns[:, 5], shift=20, axis=0),
        tf.roll(interleaver_columns[:, 6], shift=20, axis=0),
        tf.roll(interleaver_columns[:, 7], shift=21, axis=0)
    ],
    axis=1
)

# Finally, read out row by row:
interleaver_output = tf.reshape(interleaver_columns_twisted, shape=(16200,)) 

bit_interleaver_model = keras.Model(inputs=interleaver_input, outputs=interleaver_output)

hjonnala commented 1 year ago

tf.roll(interleaver_columns[:, 3], shift=1, axis=0),

its the issue with tf.roll operation. If you compile with tf2.5.0 there would be flex roll operation in the tflite model. However it not coming with tf2.11.

I don't know the equivalent op that can be substitued for the tf.roll operaton. But, you would be able to compile the model changing tf.roll section:

interleaver_columns_twisted = tf.stack(
    [
        interleaver_columns[:, 0],
        interleaver_columns[:, 1],
        interleaver_columns[:, 2],
        interleaver_columns[:, 3],
     interleaver_columns[:, 4],
     interleaver_columns[:, 5],
     interleaver_columns[:, 6],
     interleaver_columns[:, 7],
        # tf.roll(interleaver_columns[:, 3], shift=1, axis=0),
        # tf.roll(interleaver_columns[:, 4], shift=7, axis=0),
        # tf.roll(interleaver_columns[:, 5], shift=20, axis=0),
        # tf.roll(interleaver_columns[:, 6], shift=20, axis=0),
        # tf.roll(interleaver_columns[:, 7], shift=21, axis=0)
    ],
    axis=1
)

google-coral-bot[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No

gsirocco commented 1 year ago

Thanks, just to clarify. As a flex op, this operation would execute as a TF or TFLite operation on CPU instead of on the TPU, correct? So in order to compile and execute on the TPU, the roll operation would need to be rewritten using more rudimentary operations?

hjonnala commented 1 year ago

Thanks, just to clarify. As a flex op, this operation would execute as a TF or TFLite operation on CPU instead of on the TPU, correct?

Yes, if you are able to compile the model, but its not possible to compile the model.

So in order to compile and execute on the TPU, the roll operation would need to be rewritten using more rudimentary operations?

Yes..

gsirocco commented 1 year ago

I have rewritten the roll operation using more rudimentary operations. I have included the model below. So I am able to now compile the model for TPU, however some of the operations are not supported on the TPU as shown below. Do you have any suggestions for how to modify the model to run totally on the TPU? I guess the gather_nd is the major operation not being done on the TPU. Also one reshape and pack are not being done on the TPU for some reason, while most other reshapes are being done on the TPU. Thanks!!

$ edgetpu_compiler -s bitintmodel.tflite
Edge TPU Compiler version 16.0.384591198
Started a compilation timeout timer of 180 seconds.

Model compiled successfully in 354 ms.

Input model: bitintmodel.tflite
Input size: 40.59KiB
Output model: bitintmodel_edgetpu.tflite
Output size: 1.41MiB
On-chip memory used for caching model parameters: 932.50KiB
On-chip memory remaining for caching model parameters: 6.71MiB
Off-chip memory used for streaming uncached model parameters: 0.00B
Number of Edge TPU subgraphs: 1
Total number of operations: 32
Operation log: bitintmodel_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 25
Number of operations that will run on CPU: 7

Operator                       Count      Status

CONCATENATION                  1          Mapped to Edge TPU
STRIDED_SLICE                  10         Mapped to Edge TPU
GATHER_ND                      5          Operation not supported
TRANSPOSE                      2          Mapped to Edge TPU
RESHAPE                        12         Mapped to Edge TPU
RESHAPE                        1          More than one subgraph is not supported
PACK                           1          More than one subgraph is not supported
Compilation child process completed within timeout period.
Compilation succeeded!

import tensorflow as tf
from tensorflow import keras
import numpy as np

def rotate(matrix, shift):
    '''
    requested rotate function - assumes matrix shape is mxn and shifts shape is m
    '''

    # get shape of the input matrix
    shape = tf.shape(matrix)

    # compute and stack the meshgrid to get the index matrix of shape
    ind = tf.meshgrid(tf.range(shape[0]),tf.range(1), indexing='ij')[0]

    # add the value from shifts to the corresponding row and devide modulo shape[0]
    # this will effectively introduce the desired shift, but at the level of indices
    shifted_ind = tf.math.floormod(ind+shift,shape[0])

    # return the resliced tensor
    return tf.gather_nd(matrix, shifted_ind)

print(tf.__version__)

interleaver_input = keras.Input(shape=(16200,), name="interleaver_input")  # This is the LDPC codeword that is the input to the interleaver

# interleaver_input is now a Keras layer, so we shall turn it into a TensorFlow tensor before slicing it:
intermediate_input = tf.reshape(interleaver_input, shape=(16200,))
info_bits = intermediate_input[:-1800]
parity_bits = intermediate_input[-1800:]

# Interleave the parity bits only by writing them into a 5 x 360 parity interleaver matrix columnwise, then reading out rowwise:
transposed_parity_interleaver_columns = tf.reshape(parity_bits, shape=(360, 5))  # tf.reshape is row-major
parity_interleaver_columns = tf.transpose(transposed_parity_interleaver_columns)  # 5 x 360 as desired
interleaved_parity_bits = tf.reshape(parity_interleaver_columns, shape=(1800,))  

interleaver_input_after_parity_interleaving = tf.concat([info_bits, interleaved_parity_bits], 0)  # along axis 0

# Write these bits into the interleaver column by column:
transposed_interleaver_columns = tf.reshape(interleaver_input_after_parity_interleaving, shape=(8, 2025))  
interleaver_columns = tf.transpose(transposed_interleaver_columns)  # 2025 x 8 as required by DVB-C2

# Next, apply the column twists asp specified for rate 8/9 QAM256 length-16200 in DVB-C2 Table 7a:
interleaver_columns_twisted = tf.stack(
    [
        tf.reshape(interleaver_columns[:, 0], shape=(interleaver_columns[:, 0].shape[0],1)),
        tf.reshape(interleaver_columns[:, 1], shape=(interleaver_columns[:, 1].shape[0],1)),
        tf.reshape(interleaver_columns[:, 2], shape=(interleaver_columns[:, 2].shape[0],1)),
        rotate(tf.reshape(interleaver_columns[:, 3], shape=(interleaver_columns[:, 3].shape[0],1)), tf.constant(-1)),
        rotate(tf.reshape(interleaver_columns[:, 4], shape=(interleaver_columns[:, 4].shape[0],1)), tf.constant(-7)),
        rotate(tf.reshape(interleaver_columns[:, 5], shape=(interleaver_columns[:, 5].shape[0],1)), tf.constant(-20)),
        rotate(tf.reshape(interleaver_columns[:, 6], shape=(interleaver_columns[:, 6].shape[0],1)), tf.constant(-20)),
        rotate(tf.reshape(interleaver_columns[:, 7], shape=(interleaver_columns[:, 7].shape[0],1)), tf.constant(-21))
    ],
    axis=1
)

# Finally, read out row by row:
interleaver_output = tf.reshape(interleaver_columns_twisted, shape=(16200,)) 

# This is TF model
bit_interleaver_model = keras.Model(inputs=interleaver_input, outputs=interleaver_output)

# Now convert TF model to TFLite
def representative_dataset():
  for _ in range(100):
      data = np.random.randint(0, 2, (16200,))
      yield [data.astype(np.float32)]

converter = tf.lite.TFLiteConverter.from_keras_model(bit_interleaver_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.target_spec.supported_types = [tf.int8]
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8

tflite_quant_model = converter.convert()

fileName = 'bitintmodel.tflite'
with open(fileName, 'wb') as f:
     f.write(tflite_quant_model)

google-coral / edgetpu

Edge TPU compiler fails #714

Description