Closed sayakpaul closed 1 year ago
Amazing work here!
I wanted to know if you have performed inference with the converted TFLite models.
Yes and no.
Nope,
I see.
I think if you can still come up with a Colab Notebook that shows the end-to-end inference pipeline with the converted models, it would greatly be beneficial to the community.
Also, did you try converting using dynamic-range quantization?
I see.
I think if you can still come up with a Colab Notebook that shows the end-to-end inference pipeline with the converted models, it would greatly be beneficial to the community.
Yup, that's doable. Steal Keras or PyTorch code and change inference part, then it's done. I could find some time during the upcoming Lunar New Year holidays to work on it.
Keeping the issue open until then. Thank you so much!
Also, did you try converting using dynamic-range quantization?
nope, I tried fp16 quantazion for Stable Diffusion before. It's trivial. I expect dynamic-range quantization to be trivial too.
The only reason I asked is that the resulting models would be smaller than FP16 and might be less effort than pure int8 quantization. Depending on the speed and size benefits, I think it's worth trying for.
Keeping the issue open until then. Thank you so much!
done
The only reason I asked is that the resulting models would be smaller than FP16 and might be less effort than pure int8 quantization. Depending on the speed and size benefits, I think it's worth trying for.
as expected, conversion using dynamic range is quite trivial. I didn't verify if it works though. https://github.com/freedomtan/keras_cv_stable_diffusion_to_tflite/blob/main/convert_to_tflite_models_with_dynamic_range.py
@freedomtan, which TF version did you use for the above? With TF 2.11.0, the decoder conversion fails without SELECT ops.
@freedomtan, which TF version did you use for the above? With TF 2.11.0, the decoder conversion fails without SELECT ops.
@sayakpaul tf master branch I built about two weeks ago. Supposedly recent tf-nightly within last couple weeks should work.
Hmm. With TF 2.11.0, the inference also fails. I will try it out tf-nightly: https://colab.research.google.com/gist/sayakpaul/479139ceb0be68234e91f38eb1427c5d/scratchpad.ipynb
Hmm. With TF 2.11.0, the inference also fails. I will try it out tf-nightly: https://colab.research.google.com/gist/sayakpaul/479139ceb0be68234e91f38eb1427c5d/scratchpad.ipynb
Sorry, I couldn't tell which tensor is wrong from the trace. I guess maybe some of the 13 tensors between the two chunks of the diffusion model are not arranged properly. I'll check if I could do inference with tf 2.11.
dynamic range quant seems to work. the results usually are different from fp32 models, but look reasonable.
Hmm. With TF 2.11.0, the inference also fails. I will try it out tf-nightly: https://colab.research.google.com/gist/sayakpaul/479139ceb0be68234e91f38eb1427c5d/scratchpad.ipynb
it works in my python 3.8 + tf 2.11.0 + keras_cv 0.4.1 environment.
I guess the problem is the order of input tensors. With
import tensorflow as tf
i_first = tf.lite.Interpreter('/tmp/sd_diffusion_model_first.tflite')
first_input_details = i_first.get_input_details()
first_output_details = i_first.get_output_details()
for i in first_input_details:
print(i)
print("")
for o in first_output_details:
print(o)
print("")
i_second = tf.lite.Interpreter('/tmp/sd_diffusion_model_second.tflite')
second_input_details = i_second.get_input_details()
second_output_details = i_second.get_output_details()
for i in second_input_details:
print(i)
The tensors of my two models
{'name': 'serving_default_input_1:0', 'index': 0, 'shape': array([ 1, 77, 768], dtype=int32), 'shape_signature': array([ -1, 77, 768], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_input_3:0', 'index': 1, 'shape': array([ 1, 64, 64, 4], dtype=int32), 'shape_signature': array([-1, 64, 64, 4], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_input_2:0', 'index': 2, 'shape': array([ 1, 320], dtype=int32), 'shape_signature': array([ -1, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:7', 'index': 797, 'shape': array([ 1, 64, 64, 320], dtype=int32), 'shape_signature': array([ -1, 64, 64, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_input_1:0', 'index': 0, 'shape': array([ 1, 77, 768], dtype=int32), 'shape_signature': array([ -1, 77, 768], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:12', 'index': 2145, 'shape': array([ 1, 16, 16, 1280], dtype=int32), 'shape_signature': array([ -1, 16, 16, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:5', 'index': 1067, 'shape': array([ 1, 32, 32, 320], dtype=int32), 'shape_signature': array([ -1, 32, 32, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:2', 'index': 437, 'shape': array([ 1, 64, 64, 320], dtype=int32), 'shape_signature': array([ -1, 64, 64, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:9', 'index': 1337, 'shape': array([ 1, 32, 32, 640], dtype=int32), 'shape_signature': array([ -1, 32, 32, 640], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:3', 'index': 1607, 'shape': array([ 1, 16, 16, 640], dtype=int32), 'shape_signature': array([ -1, 16, 16, 640], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:11', 'index': 1877, 'shape': array([ 1, 16, 16, 1280], dtype=int32), 'shape_signature': array([ -1, 16, 16, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:10', 'index': 1605, 'shape': array([ 1, 32, 32, 640], dtype=int32), 'shape_signature': array([ -1, 32, 32, 640], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:4', 'index': 2147, 'shape': array([ 1, 8, 8, 1280], dtype=int32), 'shape_signature': array([ -1, 8, 8, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:0', 'index': 434, 'shape': array([ 1, 1280], dtype=int32), 'shape_signature': array([ -1, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:6', 'index': 2851, 'shape': array([ 1, 8, 8, 1280], dtype=int32), 'shape_signature': array([ -1, 8, 8, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'StatefulPartitionedCall:8', 'index': 1065, 'shape': array([ 1, 64, 64, 320], dtype=int32), 'shape_signature': array([ -1, 64, 64, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_8:0', 'index': 0, 'shape': array([ 1, 32, 32, 640], dtype=int32), 'shape_signature': array([ -1, 32, 32, 640], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_3:0', 'index': 1, 'shape': array([ 1, 16, 16, 1280], dtype=int32), 'shape_signature': array([ -1, 16, 16, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_7:0', 'index': 2, 'shape': array([ 1, 32, 32, 640], dtype=int32), 'shape_signature': array([ -1, 32, 32, 640], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_12:0', 'index': 3, 'shape': array([ 1, 64, 64, 320], dtype=int32), 'shape_signature': array([ -1, 64, 64, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_9:0', 'index': 4, 'shape': array([ 1, 32, 32, 320], dtype=int32), 'shape_signature': array([ -1, 32, 32, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_5:0', 'index': 5, 'shape': array([ 1, 16, 16, 1280], dtype=int32), 'shape_signature': array([ -1, 16, 16, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0:0', 'index': 6, 'shape': array([ 1, 8, 8, 1280], dtype=int32), 'shape_signature': array([ -1, 8, 8, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_6:0', 'index': 7, 'shape': array([ 1, 16, 16, 640], dtype=int32), 'shape_signature': array([ -1, 16, 16, 640], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_10:0', 'index': 8, 'shape': array([ 1, 64, 64, 320], dtype=int32), 'shape_signature': array([ -1, 64, 64, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_11:0', 'index': 9, 'shape': array([ 1, 64, 64, 320], dtype=int32), 'shape_signature': array([ -1, 64, 64, 320], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_2:0', 'index': 10, 'shape': array([ 1, 1280], dtype=int32), 'shape_signature': array([ -1, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_1:0', 'index': 11, 'shape': array([ 1, 8, 8, 1280], dtype=int32), 'shape_signature': array([ -1, 8, 8, 1280], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
{'name': 'serving_default_args_0_4:0', 'index': 12, 'shape': array([ 1, 77, 768], dtype=int32), 'shape_signature': array([ -1, 77, 768], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}
I fixed the ordering issue: https://github.com/sayakpaul/Adventures-in-TensorFlow-Lite/blob/master/Stable_Diffusion_to_TFLite.ipynb.
Would be also cool to extend the TFLite models to handle the situation when batch_size
is greater than 1 when calling generate_image()
.
I fixed the ordering issue: https://github.com/sayakpaul/Adventures-in-TensorFlow-Lite/blob/master/Stable_Diffusion_to_TFLite.ipynb.
Would be also cool to extend the TFLite models to handle the situation when
batch_size
is greater than 1 when callinggenerate_image()
.
TFLite models do support dynamic batch size, to use tflite models with batch size > 1, call resize_tensor_input before callingallocate_tensors()
Thanks! Fixed the problem in the latest commit to the notebook.
@sayakpaul FYI, on my MacBook Pro M1 8 GB machine, inference with dynamic range quantized models is much faster than 32-bit floating point ones (> 25 mins vs. < 6 mins)
Maybe because the installation was built from source for your machine. Would be interesting to see what happens when you use tf-nightly
. That is a more reasonable comparison in my opinion.
@sayakpaul I wrote minimal C++ and tested it on Android devices and macOS.
https://github.com/freedomtan/keras_cv_stable_diffusion_to_tflite/tree/main/cpp_glue_code
@farmaker47
Amazing work here!
I wanted to know if you have performed inference with the converted TFLite models.