Difficulty in TFLite Conversion

manojpreveen commented 3 years ago

Tried few ways to convert the joint-albert keras (.h5) model in the output directory to TFLite.

TF version : Stable (2.3.0) , tf-nightly(2.4.0-dev20201005) [Both produced same error logs] Code :

model = tf.keras.models.load_model('/content/drive/My Drive/joint_albert_model/joint_bert_model.h5',custom_objects={'KerasLayer':hub.KerasLayer})
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

Error Log :

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:2292: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:1377: UserWarning: `layer.updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`layer.updates` will be removed in a future version. '
INFO:tensorflow:Assets written to: /tmp/tmp7g7xnu4t/assets
INFO:tensorflow:Assets written to: /tmp/tmp7g7xnu4t/assets
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/importer.py in _import_graph_def_internal(graph_def, input_map, return_elements, validate_colocation_constraints, name, producer_op_list)
    496         results = c_api.TF_GraphImportGraphDefWithResults(
--> 497             graph._c_graph, serialized, options)  # pylint: disable=protected-access
    498         results = c_api_util.ScopedTFImportGraphDefResults(results)

InvalidArgumentError: Input 2 of node model/AlbertLayer/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/albert_transformer_encoder/StatefulPartitionedCall/transformer/StatefulPartitionedCall was passed float from Func/model/AlbertLayer/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/albert_transformer_encoder/StatefulPartitionedCall/input/_106:0 incompatible with expected resource.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
12 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/importer.py in _import_graph_def_internal(graph_def, input_map, return_elements, validate_colocation_constraints, name, producer_op_list)
    499       except errors.InvalidArgumentError as e:
    500         # Convert to ValueError for backwards compatibility.
--> 501         raise ValueError(str(e))
    502 
    503     # Create _DefinedFunctions for any imported functions.

ValueError: Input 2 of node model/AlbertLayer/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/albert_transformer_encoder/StatefulPartitionedCall/transformer/StatefulPartitionedCall was passed float from Func/model/AlbertLayer/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/albert_transformer_encoder/StatefulPartitionedCall/input/_106:0 incompatible with expected resource.

TF version : tf-nightly(2.4.0-dev20201005) Code :

model = tf.keras.models.load_model('/content/drive/My Drive/joint_albert_model/joint_bert_model.h5',custom_objects={'KerasLayer':hub.KerasLayer})
tf.saved_model.save(model, 'joint_albert_savedmodel')
converter = tf.lite.TFLiteConverter.from_saved_model('joint_albert_savedmodel')
tflite_model = converter.convert()
with tf.io.gfile.GFile(os.path.join("./", 'joint_albert.tflite'), 'wb') as f:
f.write(tflite_model)

The above conversion worked and I got the tflite model, but when I tried inference noticed that the conversion is messed and the tflite model's input_details and output_details are wrong.

Code for inference for the tflite model got above :

with tf.io.gfile.GFile("joint_albert.tflite", 'rb') as f:
    model_content = f.read()

interpreter = tf.lite.Interpreter(model_content=model_content)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

print(input_details)
print(output_details)

Output :

[{'name': 'serving_default_input_type_ids:0', 'index': 0, 'shape': array([1, 1], dtype=int32), 'shape_signature': array([-1, -1], dtype=int32), 'dtype': <class 'numpy.int32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}, {'name': 'serving_default_input_mask:0', 'index': 1, 'shape': array([1, 1], dtype=int32), 'shape_signature': array([-1, -1], dtype=int32), 'dtype': <class 'numpy.int32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}, {'name': 'serving_default_valid_positions:0', 'index': 2, 'shape': array([  1,   1, 440], dtype=int32), 'shape_signature': array([ -1,  -1, 440], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}, {'name': 'serving_default_input_word_ids:0', 'index': 3, 'shape': array([1, 1], dtype=int32), 'shape_signature': array([-1, -1], dtype=int32), 'dtype': <class 'numpy.int32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

[{'name': 'StatefulPartitionedCall:1', 'index': 945, 'shape': array([  1,   1, 440], dtype=int32), 'shape_signature': array([ -1,  -1, 440], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}, {'name': 'StatefulPartitionedCall:0', 'index': 941, 'shape': array([ 1, 53], dtype=int32), 'shape_signature': array([-1, 53], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

This input and output details for the tflite model is wrong, as you can see "shape: array([1, 1], dtype=int32)" for index:0,1 for input and etc.

Is there a way to convert the joint-albert model to tflite and run inference on it? Or is this model not supported yet?

Please help.

manojpreveen commented 3 years ago

Tried converting the joint-bert keras (.h5) model in the output directory to TFLite.

TF version : tf-nightly(2.4.0-dev20201005) NOTE : The code which was causing error for joint-albert (above) works fine for joint-bert Code :


converter = tf.compat.v1.lite.TFLiteConverter.from_keras_model_file('saved_models/joint_bert_model/joint_bert_model.h5',custom_objects={'KerasLayer':hub.KerasLayer})
tflite_model = converter.convert()

with tf.io.gfile.GFile(os.path.join("./", 'joint_bert.tflite'), 'wb') as f: f.write(tflite_model)

This converted the keras (.h5) joint-bert model to tflite model successfully.

But when I tried to do inference from this again I was getting dimension mismatch value error.
Code for printing input and output details of this tflite model :

with tf.io.gfile.GFile("joint_bert.tflite", 'rb') as f: model_content = f.read()

interpreter = tf.lite.Interpreter(model_content=model_content) interpreter.allocate_tensors() input_details = interpreter.get_input_details() output_details = interpreter.get_output_details()

print(input_details) print(output_details)


Output :

[{'name': 'input_word_ids', 'index': 0, 'shape': array([1, 1], dtype=int32), 'shape_signature': array([-1, -1], dtype=int32), 'dtype': <class 'numpy.int32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}, {'name': 'input_mask', 'index': 1, 'shape': array([1, 1], dtype=int32), 'shape_signature': array([-1, -1], dtype=int32), 'dtype': <class 'numpy.int32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}, {'name': 'input_type_ids', 'index': 2, 'shape': array([1, 1], dtype=int32), 'shape_signature': array([-1, -1], dtype=int32), 'dtype': <class 'numpy.int32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}, {'name': 'valid_positions', 'index': 3, 'shape': array([ 1, 1, 73], dtype=int32), 'shape_signature': array([-1, -1, 73], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

[{'name': 'Identity', 'index': 1612, 'shape': array([1, 7], dtype=int32), 'shape_signature': array([-1, 7], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}, {'name': 'Identity_1', 'index': 1613, 'shape': array([ 1, 1, 73], dtype=int32), 'shape_signature': array([-1, -1, 73], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]



This input and output details for the tflite model is wrong, as you can see "shape: array([1, 1], dtype=int32)" for index:0,1 for input and etc again.

MahmoudWahdan commented 3 years ago

Hi @tromedlov22 I didn't try to convert joint_bert and joint_albert models (aka The Tensorflow-hub based models), but I tried to convert the transformers-based models in my internal projects (not this repo). Here are a few tips that I learned the hard way: 1- transformers models need feature (flex delegate) that available in python only starting from TF 2.4.0 and since it is not released yet, you can use tf-nightly. 2- None in model's input shapes is converted to 1 that is why you get shape [1, 1] [{'name': 'input_word_ids', 'index': 0, 'shape': array([1, 1], dtype=int32) ...

I'm planing to support the tflite conversion and serving pool for high throughput. Please, keep an eye on #24 and #25 Meanwhile, If you are interested in a compressed model, I do recommend using layer pruning feature as it is very efficient and implemented in this repo.

Please, let me know if it helped.

MahmoudWahdan / dialog-nlu

Difficulty in TFLite Conversion #22