freedomtan / keras_cv_stable_diffusion_to_tflite

Scripts for converting Keras CV Stable Diffusion to tflite
BSD 3-Clause "New" or "Revised" License
30 stars 6 forks source link

Have you performed inference with the TFLite models? #1

Closed sayakpaul closed 1 year ago

sayakpaul commented 1 year ago

Amazing work here!

I wanted to know if you have performed inference with the converted TFLite models.

freedomtan commented 1 year ago

Amazing work here!

I wanted to know if you have performed inference with the converted TFLite models.

Yes and no.

sayakpaul commented 1 year ago

I see.

I think if you can still come up with a Colab Notebook that shows the end-to-end inference pipeline with the converted models, it would greatly be beneficial to the community.

sayakpaul commented 1 year ago

Also, did you try converting using dynamic-range quantization?

freedomtan commented 1 year ago

I see.

I think if you can still come up with a Colab Notebook that shows the end-to-end inference pipeline with the converted models, it would greatly be beneficial to the community.

Yup, that's doable. Steal Keras or PyTorch code and change inference part, then it's done. I could find some time during the upcoming Lunar New Year holidays to work on it.

sayakpaul commented 1 year ago

Keeping the issue open until then. Thank you so much!

freedomtan commented 1 year ago

Also, did you try converting using dynamic-range quantization?

nope, I tried fp16 quantazion for Stable Diffusion before. It's trivial. I expect dynamic-range quantization to be trivial too.

sayakpaul commented 1 year ago

The only reason I asked is that the resulting models would be smaller than FP16 and might be less effort than pure int8 quantization. Depending on the speed and size benefits, I think it's worth trying for.

freedomtan commented 1 year ago

Keeping the issue open until then. Thank you so much!

done

https://github.com/freedomtan/keras_cv_stable_diffusion_to_tflite/commit/7b44ce2d985c3ceedc8163aba0a2d1781107e2c0

freedomtan commented 1 year ago

The only reason I asked is that the resulting models would be smaller than FP16 and might be less effort than pure int8 quantization. Depending on the speed and size benefits, I think it's worth trying for.

as expected, conversion using dynamic range is quite trivial. I didn't verify if it works though. https://github.com/freedomtan/keras_cv_stable_diffusion_to_tflite/blob/main/convert_to_tflite_models_with_dynamic_range.py

sayakpaul commented 1 year ago

@freedomtan, which TF version did you use for the above? With TF 2.11.0, the decoder conversion fails without SELECT ops.

freedomtan commented 1 year ago

@freedomtan, which TF version did you use for the above? With TF 2.11.0, the decoder conversion fails without SELECT ops.

@sayakpaul tf master branch I built about two weeks ago. Supposedly recent tf-nightly within last couple weeks should work.

sayakpaul commented 1 year ago

Hmm. With TF 2.11.0, the inference also fails. I will try it out tf-nightly: https://colab.research.google.com/gist/sayakpaul/479139ceb0be68234e91f38eb1427c5d/scratchpad.ipynb

freedomtan commented 1 year ago

Hmm. With TF 2.11.0, the inference also fails. I will try it out tf-nightly: https://colab.research.google.com/gist/sayakpaul/479139ceb0be68234e91f38eb1427c5d/scratchpad.ipynb

Sorry, I couldn't tell which tensor is wrong from the trace. I guess maybe some of the 13 tensors between the two chunks of the diffusion model are not arranged properly. I'll check if I could do inference with tf 2.11.

freedomtan commented 1 year ago

dynamic range quant seems to work. the results usually are different from fp32 models, but look reasonable.

https://github.com/freedomtan/keras_cv_stable_diffusion_to_tflite/blob/main/text_to_image_using_converted_tflite_models_dynamic.ipynb

freedomtan commented 1 year ago

Hmm. With TF 2.11.0, the inference also fails. I will try it out tf-nightly: https://colab.research.google.com/gist/sayakpaul/479139ceb0be68234e91f38eb1427c5d/scratchpad.ipynb

it works in my python 3.8 + tf 2.11.0 + keras_cv 0.4.1 environment.

I guess the problem is the order of input tensors. With

import tensorflow as tf

i_first = tf.lite.Interpreter('/tmp/sd_diffusion_model_first.tflite')
first_input_details = i_first.get_input_details()
first_output_details = i_first.get_output_details()

for i in first_input_details:
    print(i)

print("")

for o in first_output_details:
    print(o)

print("")
i_second = tf.lite.Interpreter('/tmp/sd_diffusion_model_second.tflite')
second_input_details = i_second.get_input_details()
second_output_details = i_second.get_output_details()

for i in second_input_details:
    print(i)

The tensors of my two models

{'name': 'serving_default_input_1:0', 'index': 0, 'shape': array([  1,  77, 768], dtype=int32), 'shape_signature': array([ -1,  77, 768], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_input_3:0', 'index': 1, 'shape': array([ 1, 64, 64,  4], dtype=int32), 'shape_signature': array([-1, 64, 64,  4], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_input_2:0', 'index': 2, 'shape': array([  1, 320], dtype=int32), 'shape_signature': array([ -1, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}

{'name': 'StatefulPartitionedCall:7', 'index': 797, 'shape': array([  1,  64,  64, 320], dtype=int32), 'shape_signature': array([ -1,  64,  64, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_input_1:0', 'index': 0, 'shape': array([  1,  77, 768], dtype=int32), 'shape_signature': array([ -1,  77, 768], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:12', 'index': 2145, 'shape': array([   1,   16,   16, 1280], dtype=int32), 'shape_signature': array([  -1,   16,   16, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:5', 'index': 1067, 'shape': array([  1,  32,  32, 320], dtype=int32), 'shape_signature': array([ -1,  32,  32, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:2', 'index': 437, 'shape': array([  1,  64,  64, 320], dtype=int32), 'shape_signature': array([ -1,  64,  64, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:9', 'index': 1337, 'shape': array([  1,  32,  32, 640], dtype=int32), 'shape_signature': array([ -1,  32,  32, 640], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:3', 'index': 1607, 'shape': array([  1,  16,  16, 640], dtype=int32), 'shape_signature': array([ -1,  16,  16, 640], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:11', 'index': 1877, 'shape': array([   1,   16,   16, 1280], dtype=int32), 'shape_signature': array([  -1,   16,   16, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:10', 'index': 1605, 'shape': array([  1,  32,  32, 640], dtype=int32), 'shape_signature': array([ -1,  32,  32, 640], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:4', 'index': 2147, 'shape': array([   1,    8,    8, 1280], dtype=int32), 'shape_signature': array([  -1,    8,    8, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:0', 'index': 434, 'shape': array([   1, 1280], dtype=int32), 'shape_signature': array([  -1, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:6', 'index': 2851, 'shape': array([   1,    8,    8, 1280], dtype=int32), 'shape_signature': array([  -1,    8,    8, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:8', 'index': 1065, 'shape': array([  1,  64,  64, 320], dtype=int32), 'shape_signature': array([ -1,  64,  64, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}

{'name': 'serving_default_args_0_8:0', 'index': 0, 'shape': array([  1,  32,  32, 640], dtype=int32), 'shape_signature': array([ -1,  32,  32, 640], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_3:0', 'index': 1, 'shape': array([   1,   16,   16, 1280], dtype=int32), 'shape_signature': array([  -1,   16,   16, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_7:0', 'index': 2, 'shape': array([  1,  32,  32, 640], dtype=int32), 'shape_signature': array([ -1,  32,  32, 640], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_12:0', 'index': 3, 'shape': array([  1,  64,  64, 320], dtype=int32), 'shape_signature': array([ -1,  64,  64, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_9:0', 'index': 4, 'shape': array([  1,  32,  32, 320], dtype=int32), 'shape_signature': array([ -1,  32,  32, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_5:0', 'index': 5, 'shape': array([   1,   16,   16, 1280], dtype=int32), 'shape_signature': array([  -1,   16,   16, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0:0', 'index': 6, 'shape': array([   1,    8,    8, 1280], dtype=int32), 'shape_signature': array([  -1,    8,    8, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_6:0', 'index': 7, 'shape': array([  1,  16,  16, 640], dtype=int32), 'shape_signature': array([ -1,  16,  16, 640], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_10:0', 'index': 8, 'shape': array([  1,  64,  64, 320], dtype=int32), 'shape_signature': array([ -1,  64,  64, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_11:0', 'index': 9, 'shape': array([  1,  64,  64, 320], dtype=int32), 'shape_signature': array([ -1,  64,  64, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_2:0', 'index': 10, 'shape': array([   1, 1280], dtype=int32), 'shape_signature': array([  -1, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_1:0', 'index': 11, 'shape': array([   1,    8,    8, 1280], dtype=int32), 'shape_signature': array([  -1,    8,    8, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_4:0', 'index': 12, 'shape': array([  1,  77, 768], dtype=int32), 'shape_signature': array([ -1,  77, 768], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
sayakpaul commented 1 year ago

I fixed the ordering issue: https://github.com/sayakpaul/Adventures-in-TensorFlow-Lite/blob/master/Stable_Diffusion_to_TFLite.ipynb.

Would be also cool to extend the TFLite models to handle the situation when batch_size is greater than 1 when calling generate_image().

freedomtan commented 1 year ago

I fixed the ordering issue: https://github.com/sayakpaul/Adventures-in-TensorFlow-Lite/blob/master/Stable_Diffusion_to_TFLite.ipynb.

Would be also cool to extend the TFLite models to handle the situation when batch_size is greater than 1 when calling generate_image().

TFLite models do support dynamic batch size, to use tflite models with batch size > 1, call resize_tensor_input before callingallocate_tensors()

sayakpaul commented 1 year ago

Thanks! Fixed the problem in the latest commit to the notebook.

freedomtan commented 1 year ago

@sayakpaul FYI, on my MacBook Pro M1 8 GB machine, inference with dynamic range quantized models is much faster than 32-bit floating point ones (> 25 mins vs. < 6 mins)

sayakpaul commented 1 year ago

Maybe because the installation was built from source for your machine. Would be interesting to see what happens when you use tf-nightly. That is a more reasonable comparison in my opinion.

freedomtan commented 1 year ago

@sayakpaul I wrote minimal C++ and tested it on Android devices and macOS.

https://github.com/freedomtan/keras_cv_stable_diffusion_to_tflite/tree/main/cpp_glue_code

sayakpaul commented 1 year ago

@farmaker47