Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.49k stars 634 forks source link

Is it possible to deploy my own QNN? #257

Open Kelvinyu1117 opened 3 years ago

Kelvinyu1117 commented 3 years ago

I would like to train the network with my own quantizer, it is possible to deploy such a model onto the board? The weights of the network are quantized in low precision but stored in fp32. All the model development will be done in using Pytorch.

niuxjxlnx commented 3 years ago

In general, Vitis-AI quantizer does not support customized quantization strategy because it is tightly related with hardware design. And I suggest you check our QAT(Quantization Aware Training) part in user guide, that maybe meet your requirement.

Kelvinyu1117 commented 3 years ago

I am trying to quantize the ResNet in Pytorch. What long time does it take typically? I am not sure that is it normal to wait for a long time on this stage? image

niuxjxlnx commented 3 years ago

According to the network throughput and layers number, the stage upper you shows will need minutes to several hours. The stage is in fast finetuning. You can limit the data items number in fast finetuning, for example, 1000, to accelerate the process.

Kelvinyu1117 commented 3 years ago

I can compile the model successfully. However, I encounter an error when I try to deploy to Alveo U50. I follow the example to write my code: https://github.com/Xilinx/Vitis-In-Depth-Tutorial/tree/master/Machine_Learning/Design_Tutorials/09-mnist_pyt-master

and I actually used get_child_subgraph_dpu() in the example directly: https://github.com/Xilinx/Vitis-In-Depth-Tutorial/blob/master/Machine_Learning/Design_Tutorials/09-mnist_pyt-master/files/application/app_mt.py

image but it gives me the following error image

It seems the error thrown by cs.get_attr("device"), I have used the command xir dump_txt <xmodel> <txt _file> to dump the model information to a text file and I can see there is a "device" attribute for the operators.

I have also tried to run the example directly instead of modifying it and writing my own code. For the example code mention above, I got the following error: image

I'm not sure how to debug this problem, can you give me some ideas?

[updated on 8 Feb] I can successfully run the example model now, but my code still throw the bad_cast error, I have logged the content of the subgraph and I found that for the example model, the .xmodel file can be correctly deserialized to a python dict, but it doesn't work for my code. I am not sure why.

image

[updated on 9 Feb] I can successfully quantize the network and run my network on U50 now. However, I found that the accuracy is very low(~11%) compare to the test result evaluated after quantization (90.51%). I am not sure is it related to the fact that I have run the above example network trained by MNIST before running my model that the bitstream of the example network may not be cleared? Can you give me some advice about this?

lishixlnx commented 3 years ago

Use one classic data as the test data, then compare the input layer of your U50 network and the input layer of your qantized network, make sure they are same (or almost same), if not, please improve your deploy code to make it same. then compare the output layer of your U50 network and the output layer of your qantized network, check if they are same or with much difference.
if they are same (or alost same), then check your deploy code of the post-process part. if they are different, you should check your network.

Kelvinyu1117 commented 3 years ago

Thank you for your comments, I have successfully deployed the model on U50. However, is it possible to know the model size/resource usage before/after quantization? As the model is quantized into 8 bit, there should be some reduction in model size?

niuxjxlnx commented 3 years ago

Hi, @Kelvinyu1117 :

Model size/resource statistic summary will be supported in next release. Model size should be reduced by 8 bits quantization, but may not be proportional.

Kelvinyu1117 commented 3 years ago

So, is it not possible to get those information at this moment? For example, can I write some script to calculate the result?

niuxjxlnx commented 3 years ago

Yes, it is possible.

Kelvinyu1117 commented 3 years ago

Can you give me some advice for writing those scripts to get the resource utilization?

niuxjxlnx commented 3 years ago

You can use some modules to do this, such as thop/torchstat/torchsummary.

Kelvinyu1117 commented 3 years ago

How about the quantized model statistics? Can I use the same approach? As the weight has been quantized into lower bit width.

niuxjxlnx commented 3 years ago

In quantizer, weight are still float value but the value were changed to (quantization step * a integer value in [-128, 127)). So nothing special for quantizatized weights on size among quantization, you can get float model weights count and calculated quantized weights size if weights values are integer.

Kelvinyu1117 commented 3 years ago

So you mean the model size = number of total trainable params * quantized weights size (8 bits/4bits/2bits)?

The quantization is done by the AI quantizer (8bits), I think the values of the weights are integer right?

Also, I would like to know if I want to get the summary of the quantized model, should I instantiate the model class generated by NNDCT after quantization and compilation, and apply torchsummary to that object?