Compilation of Pytorch model: several dpu subgraphs

Guriido commented 3 years ago

Hello, I'm trying to use Vitis-AI software to use a pytorch model on a DPU. I successfully generated the quantized model and deployed it to a xxx_int.xmodel file, and then tried to compile it with the vai_c_xir command as specified in the documents. The compilation finished without any error message nor warning, and the three files (md5sum.txt, meta.json compiled_model.xmodel) were generated, but the number of DPU subgraph was 20, so I can not execute it directly on my DPU.

The output of compilation is the following:

(vitis-ai-pytorch) Vitis-AI /work > vai_c_xir -x quantize_result/EfficientDet_int.xmodel  -a arch.json -o quantize_result/ -n efficientdet_u96_compiled
**************************************************
* VITIS_AI Compilation - Xilinx Inc.
**************************************************
[UNILOG][INFO] The compiler log will be dumped at "/tmp/vitis-ai-user/log/xcompiler-20210305-033056-788"
[UNILOG][INFO] Compile mode: dpu
[UNILOG][INFO] Debug mode: function
[UNILOG][INFO] Target architecture: DPUCZDX8G_ISA0_B2304_MAX_BG2
[UNILOG][INFO] Graph name: EfficientDet, with op num: 1392
[UNILOG][INFO] Begin to compile...
[UNILOG][INFO] Total device subgraph number 83, DPU subgraph number 20
[UNILOG][INFO] Compile done.
[UNILOG][INFO] The meta json is saved to "/work/quantize_result/meta.json"
[UNILOG][INFO] The compiled xmodel is saved to "/work/quantize_result//efficientdet_compiled.xmodel"
[UNILOG][INFO] The compiled xmodel's md5sum is 9cd7310f6c3f88a8dd62928ea0d7a4b6, and been saved to "/work/quantize_result/md5sum.txt"

I tried to compile for the ZCU102/104 architecture, and for the Ultra96 architecture, with the same number of DPU subgraphs as result. (The development environment is the latest vitis-ai docker image, using vitis-ai 1.3)

I have several questions concerning this:

1) The reason the compiled model has so many DPU subgraphs is that the model contained operations not supported for the DPU?

2) Is there a simple way, using the VART python API, to do inference of the whole model from the input image? (without having to engineer the input and output of each subgraph by hand) I am thinking about something in the lines of https://github.com/Xilinx/Vitis-AI/blob/master/demo/VART/resnet50_mt_py/resnet50.py

3) (supposing 1) answer is yes) Is there a way to know the reason why the layers were not supported by DPU during compilation, something like a verbose mode or the log of compilation? There is a mention [UNILOG][INFO] The compiler log will be dumped at ..., but I checked at this location and the supposed log is just an empty folder.

Thank you for your consideration

qianglin-xlnx commented 3 years ago

Hi @Guriido

Yes.
You can try to use Vitis-AI-Library to deploy the model. You can refer to https://github.com/Xilinx/Vitis-AI/tree/master/demo/Vitis-AI-Library
For edge DPU, you can refer to page23 of PG338 https://www.xilinx.com/support/documentation/ip_documentation/dpu/v3_3/pg338-dpu.pdf

Guriido commented 3 years ago

Thank you very much for your answer @qianglin-xlnx

I already looked at the demo examples before submitting this issue, but all the implementations seem to assert that the model has only one DPU subgraph. (except maybe the pose detection example, but the model itself is the combination of two independent models, one for person detection and the other for pose inference, and thus the models are only fed true images and not intermediate feature map tensors or such) All VART demo samples are by the way explicitly checking for the existence of a single subgraph (c.f. https://github.com/Xilinx/Vitis-AI/blob/ffdfd826394dd24c6f828733f5cfa99ab4153940/demo/VART/resnet50/src/main.cc#L283) The link you gave was under the assumption that the model would be tuned to fit all operators for the DPU beforehand, is it correct?
I am aware of the limitations of each layer according to the DPU intrinsic parameter (bank depth etc.), but I was looking for a way to have access to all non-conform layers through a tool. The non-conformity is checked when executing the compilation with vai_c_xir command, so I thought there would be a way to display this information during compilation, through an environment variable or a debug flag or something like that. Can you tell me if there is anything close to this, or maybe a script to check a .xmodel according to an architecture configuration?

Thanks in advance

qianglin-xlnx commented 3 years ago

Hi @Guriido 2 Not exactly. Vitis AI LIbrary also support the model with more than 1 dpu subgraph, such as sp_net model. You can find the implementation code in the link below. https://github.com/Xilinx/Vitis-AI/blob/ffdfd826394dd24c6f828733f5cfa99ab4153940/tools/Vitis-AI-Library/posedetect/src/posedetect_imp.cpp#L50 https://github.com/Xilinx/Vitis-AI/blob/master/tools/Vitis-AI-Library/platenum/src/platenum_imp.cpp#L126

However, we recommend compiling the model into only one dpu subgraph as far as possible and running the model on the DPU. In this case, you will get the best performance of the model.

3 Thank you for your advice. So far, the unsupported layers or OPs are shown only when you compiler the model, such as the following. However, there is no tool or script to check the model according to an architecture configuration. Maybe we need some kind of this tool. I'm not sure.

[UNILOG][WARNING] xir::Op{name = PMGPMG_MaxPool2d_max2812, type = pool-fix} has been assigned to CPU: ["kernel_height(14) is not in DPU supported range [1, 2]]. [UNILOG][WARNING] xir::Op{name = PMGPMG_MaxPool2d_max3823, type = pool-fix} has been assigned to CPU: ["kernel_height(14) is not in DPU supported range [1, 2]]. [UNILOG][WARNING] xir::Op{name = PMGPMG_MaxPool2d_max4834, type = pool-fix} has been assigned to CPU: ["kernel_height(14) is not in DPU supported range [1, 2]]. [UNILOG][WARNING] xir::Op{name = PMGPMG_MaxPool2d_max1801, type = pool-fix} has been assigned to CPU: ["kernel_height(28) is not in DPU supported range [1, 2]].

Guriido commented 3 years ago

@qianglin-xlnx

Thank you for the sample! It seems to require handcrafting the missing operator with CPU functions, so as you said it is best to compile the whole model in one DPU subgraph.
So far, the unsupported layers or OPs are shown only when you compiler the model, such as the following. [UNILOG][WARNING] xir::Op{name = PMGPMG_MaxPool2d_max2812, type = pool-fix} has been assigned to CPU: ["kernel_height(14) is not in DPU supported range [1, 2]].

This was exactly what I was looking for! But as you can see in the full log I shared in my first message, even if the number of dpu subgraphs after compilation is (way) superior to 1, there is no such warning message that appeared. How can I enable those messages when using vai_c_xir?

qianglin-xlnx commented 3 years ago

@Guriido 3 It's very strange. Which docker do you use and could you share quantize_result/EfficientDet_int.xmodel for our further debug? Thank you very much.

Guriido commented 3 years ago

@qianglin-xlnx I used a docker image built from source through the script of the master branch https://github.com/Xilinx/Vitis-AI/blob/master/setup/docker/docker_build_gpu.sh

I just pulled the latest cpu version (docker pull xilinx/vitis-ai-cpu:latest) and tested the compilation with this environment, getting an identical result (multiple dpu subgraphs and no warning).

I hosted my .xmodel file before compilation at this link.

Guriido commented 3 years ago

@qianglin-xlnx After checking my pytorch model, I noticed some convolutions were too big to fit in the DPU (due to channel_parallel and bank_depth constraints), and fixing these helped reduce the generated number of DPU subgraph from 20 to 4. I tried to check the other layer types as well, but could not find other discrepancies with the spec of ZU3 DPU.

I checked manually the computation graph of the compiled model, and find some subgraphs mapped to CPU (I guess when generated with xir svg <model> <.svg> command the CPU parts are in red?) that do not have equivalent in the original implementation, and seem to correspond to a dummy operation? CPU_subgraph It is composed of fix2float -> add with some scalar constant -> float2fix, but how could I get rid of it? I tried to search in the issues of this repository and the forums, but could not find answers. Thank you in advance for your help

qianglin-xlnx commented 3 years ago

Hi @Guriido I've already reported this internally. Currently, what we found is that the information about unsupported ops is not printed at compile time because these ops are not quantized. We will look into this issue further.

Guriido commented 3 years ago

@qianglin-xlnx Thank you for looking into it! I don't know if this information has any use, but during the quantization (calib) and quantization (test)/deploy, there were no errors nor warnings as well.

quyetvk commented 3 years ago

Hi all,

I have got same issue as @Guriido. I checked the output graph after compiling. There are two sub-graphs are not executed by DPU as below so my graph is divided into 4 DPU sub-graphs. Writing application for that is not easy for me.

fix2float -> resize -> float2fix -> other subgraph
               ^
               |
const -->   stack

I am look forward to the solution.

Thank you!

Guriido commented 3 years ago

@quyetvk I am curious about your graph, did you have an Interpolate or similar function at this place in your Pytorch model? Do you have checked the shapes of the input and output of the subgraph you showed? (at fix2float and float2fix operators) If the answer to the first question is yes, it may be that your resize function is not compliant with the restrictions for DPU conversion (check the docs here at the bottom of page 101)

quyetvk commented 3 years ago

@Guriido Yes, we have Interpolate. I put my subgraph here:

I think you are right. Maybe the resize function is not compliant with restrictions. Thank you for your referrence

qianglin-xlnx commented 2 years ago

Hi @quyetvk Has this issue been solved?

qianglin-xlnx commented 2 years ago

Hi @quyetvk Since we haven't received your reply for a long time, we assume you have solved this issue and I'm going to close it. If you still have any questions, please feel free to reopen it. Thank you very much.

Xilinx / Vitis-AI

Compilation of Pytorch model: several dpu subgraphs #320