Closed edupuis-psee closed 1 month ago
This is a good idea to improve performance and sounds like it would be quite common (Pretty much all CV models).
Hi @edupuis-psee , thanks for the issue report. The issue in the example you provided seems to be transposes on quantized weights not properly folded. We will improve this in our converter later. Besides, instead of PT2E quant, we suggest to use ai-edge-quantizer
with ai-edge-torch
for better quantization user experience and performance (tag @paulinesho for more information).
For general NCHW -> NHWC transformation, we have dedicated optimization in our converter to minimize number of transposes while preserving the model input and output signatures, all happen automatically. We also have a utility to help you transform model input and output to NHWC. If you run into other issues where transposes are not properly eliminated (like this issue), feel free to report to us and we will improve our optimization algorithm. Thanks!
Marking this issue as stale since it has been open for 7 days with no activity. This issue will be closed if no further activity occurs.
Thank you for your answer, do you have more info on ai-edge-quantizer
? I couldn't find the repo, I need to see if QAT is supported
Maybe this is related, one problem I face when exporting YoloV8 generated in torch to EdgeTPU is the big TRANSPOSE operation which does not fit the EdgeTPU. Only if I decrease the resolution of the image and hence decrease parameters of the TRANSPOSE, the model will fit. Making the TRANSPOSE aware of the limitations of the edgetpu, maybe split into 2 operations would reduce the complexity and be compiled in the same subgraph of the edgeTPU
Note in the image bellow how the EdgeTPU graph is split mainly because of the Transpose operation
It is somehow related to the fact thay torch has the channels in the beginning while Tensorflow in the end.
Thank you for your answer, do you have more info on
ai-edge-quantizer
? I couldn't find the repo, I need to see if QAT is supported
Hello, the repo is now public here https://github.com/google-ai-edge/ai-edge-quantizer/tree/main. QAT is not currently supported though so our best bet today is still converting pre-QAT'd models. If you don't strictly require QAT, converting with AI Edge Torch and then quantizing with AI Edge Quantizer will give you the cleanest (hence most optimal) graph. Otherwise I'd defer to @chunnienc on future plans to support NHWC weights.
Marking this issue as stale since it has been open for 7 days with no activity. This issue will be closed if no further activity occurs.
This issue was closed because it has been inactive for 14 days. Please post a new issue if you need further assistance. Thanks!
Description of the bug:
The current implementation of the PT2E creates numerous transpose operation (NCHW -> NHWC) for the weights, which slows down the inference, is there a way to have the weights stored in NHWC format directly ?
To reproduce:
Actual vs expected behavior:
Currently after a PT2E -> TFLITE conversion weights are stored in NCHW and a transpose op is inserted before the conv layer. The expected behavior is storing the weights in NHWC
Any other information you'd like to share?