Large MobilePose accuracy drop between quantized model and compiled model

Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.

https://www.xilinx.com/ai

Apache License 2.0

1.44k stars 626 forks source link

Large MobilePose accuracy drop between quantized model and compiled model #1069

Closed lino-alves closed 1 year ago

lino-alves commented 1 year ago

Hi,

I am deploying the MobilePose NN into the DPU of a ZCU104 evaluation board. I quantized the model, compiled the model and ran the inference on the MPSoC, however there is a large accuracy drop between the results I get from the quantized model (running in my GPU) and the compiled model (running in the DPU). I was expecting some accuracy drop between the float model and the quantized model due to the reduction of the resolution (32 bit floating point to 8 bit fixed point), but I can't explain the large difference between the quantized model and the compiled model. I debugged my MPSoC code using the quantized model inference as reference and both pre and post processing are working as intended in the MPSoC, only the results I get from the model are off.

The quantized model results look reasonable and probably can be improved with quantization aware training, but the compiled model results are surprisingly unusable and I don't think QAT can help in this case, right?

The MobilePose model I am using is implemented in PyTorch (with a MobileNetV2 backbone) and is available in github at this link.

Like I said, I can't explain this difference in quantized vs compiled results, so it would be good to have some insights from the Vitis-AI experts whether this is expected or not. Are there specific architectures where this could happen? Is there anything I can do to improve the compiled model results and bring them closer to the quantized model results?

Thanks for the support

lishixlnx commented 1 year ago

can you please do a test by using the same input image and both run with quantized model and compiled model? then compare the output of the 2 models to find if they are same. If not, maybe there's some error with your compiled model. You need to debug it layer by layer to find out which layer's output cause the difference.

lino-alves commented 1 year ago

Yes, I made the comparison with the same input images and the outputs from each of the models differs by a lot.

I spent some time looking into the layers of each model and I've seen the Conv2d and BatchNorm2d are being merged, don't know if that can cause issues. I have also seen that PixelShuffle is being converted to Tile, which according to a note in UG1414 is the expected behaviour, when there's a convolution at its input, but before merging Conv2d and BatchNorm2d, it was a BatchNorm2d at the input of PixelShuffle, not a Conv2d. I don't know if all this merges and changes can cause issues.

I am attaching the text export with the layers of the float model, it would be great if someone could take a look. mobilepose.txt

Regarding debugging layer by layer I imagine the only way is to trim the model and add the layers one by one until the results don't match anymore, right? Is there any other way of doing that? Maybe partial quantization?

lishixlnx commented 1 year ago

if after the merge, the output differs, then the difference is likely caused by the layer merge. Is it possible to change your op to the supported op?

qianglin-xlnx commented 1 year ago

Closing since no activity for more than 1 month, please reopen if you still have any questions, thanks

liuzrcc commented 8 months ago

Hi. I have had a similar issue. The quantized model works well, but the test accuracy drops after the compilation on the same test set. The platform is ZCU104 with Vitis-AI 2.0.

Any hints on whether this issue has been solved in a later Vitis-AI version? Or Do I need to check layer-by-layer and find a walkaround for the suspecious layers?

lino-alves commented 8 months ago

Hi. I ended removing a portion of the model to check if it was the source of the accuracy loss and though the results improved they were still very poor. Given that the model includes a MobileNet backbone I am at the moment proceeding with Qantization Aware Training to verify if I start to get reasonable results and then will add back the portion of the model that I removed to verify if it causes issues or not.

Nevertheless there was no explanation why there is a massive difference between the quantized model and the compiled model. I didn't check with the newer versions of Vitis-AI yet, but i intend to use the latest version to run QAT and then compare the results of the float, quantized and compiled models.

All of this is still work in progress.

liuzrcc commented 8 months ago

Hi, Lino. Thank you for your answers. I have found a way to bypass this compilation issue.

Initially, I used int8 format images (i.e., after subtracting the mean) to train the model and deployed the DPU model that also takes int8 input, where the accuracy drop after compilation is 20%. Now, I shifted to FP32 for all steps, and the accuracy decrease is 0.5%, which is acceptable to me. I hope my case could help you a bit.

lino-alves commented 4 weeks ago

Finally I had the time and all the pieces to find the root cause of this issue.

I now have a quantization aware trained (QAT) model of the MobilePose architecture with a MobileNetV2 backbone.
The difference in the inference results between the quantized model (running on a workstation's GPU) and the compiled model (running on a MPSoC) is still massive, which continues to be puzzling and unexplained.

I started splitting the model in two by moving layers from the DPU to the CPU one by one until the results are fine and what I found is that PyTorch's PixelShuffle operation is the root cause.
If I run the inference on the DPU just till the layer before the first PixelShuffle and infer the remaining layers on the CPU the results are good, on the other side by adding the first PixelShuffle layer to the DPU inference and still running the remaining layers on the CPU, then the results are very poor to the point of being unusable.

It seems that at least for this CNN model the PixelShuffle operation is not being correctly compiled or inferred on the DPU.

Given that I can't find a way to reopen this ticket, I will open another one if I don't see any activity here in the next days, because to me this seems like a bug, and one that can explain why the quantized model is working fine and the compiled is not.

quentonh commented 4 weeks ago

@lishixlnx Are you able to re-open this issue and potentially assist @lino-alves in debugging this further?