Open pushkarjain1009 opened 1 month ago
I applied dynamic quantization to both the TFLite models: the diffusion model
and the text_encoder model
. However, I encountered difficulties with the diffusion model
due to its large size and couldn't find a suitable method to quantize it using the ONNX library at that time. Furthermore, the inference time of the text_encoder model did not significantly improve with the INT8 ONNX model, so I decided to keep the TFLite version for simplicity.
Attached is the notebook I used for converting and quantizing these models in this project.
Hey, Can you please elaborate on Quantisation method you used here for SD-1.4. I am trying to implement similar project bit stuck with Quantisation process. I presume you user INT-8 quantisation for deployment on Mobile Device. How you have achieved that in both TFLITE format and ONNX format. Can you please help me with that.