Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.49k stars 633 forks source link

Vitis AI Compiler for Caffe model error #6 #197

Closed JerrySciangula closed 4 years ago

JerrySciangula commented 4 years ago

Hi everyone,

I'm a student at Scuola Superiore Sant'Anna in Pisa.

I'm trying to accelerate a network model on a ZCU102 ultrascale+ using Vitis-AI tools v1.2. I can successfully quantize the model, but when launching the vai_c_caffe command, it reports:

_[VAI_C][Warning] Only 'channel' axis supported on DPU for Concat, current layer is [Concat1]. [VAI_C][Warning] Layer [deconv_out] is not compressed. [VAI_C][Warning] layer [Slice1] (type: Slice) is not supported in DPU, deploy it in CPU instead. [VAI_C][Warning] layer [deconv_out] is not compressed, deploy it in CPU instead. [VAI_C][Warning] layer [softmax] (type: Softmax) is not supported in DPU, deploy it in CPU instead. [VAI_C][Fatal] Check failed for condition [layersmap.find(layer->name()) == layersmap.end()] in [/home/xbuild/conda-bld/dnnc_1592904456005/work/dnncimpl/compile/kernel.cc:149] :Duplicate names exist in current layer graph, name is [Slice1].

The last is the fatal error that blocks the compilation process.

Could anyone explain me the error and how to fix it, if it is possible?

Mookel commented 4 years ago

Hi, @JerrySciangula could you share the deploy.prototxt file, thus I can reproduce the issue? thanks very much.

JerrySciangula commented 4 years ago

Hi @Mookel.

I have to say that I'm trying to resolve with a trick the previous issue at #180. I decided to add a slice point since in #180 you reported me that there was a bug in the compiler. I was thinking that editing the quantized deploy.prototxt and the corresponding quantized deploy.caffemodel adding a slice point would have resolved the problem. However the compiler reported the fatal error mentioned in this issue.

So, I think it is better to share with you the modified quantized deploy.prototxt and the modified deploy.caffemodel, to reproduce this issue directly using the VAI_C compiler.

Since the caffemodel file is too big, I share the link to my Drive with the zipped files. Here they are: link

Mookel commented 4 years ago

Hi, @JerrySciangula Well noted, I will give feedbacks ASAP.

Mookel commented 4 years ago

Hi, @JerrySciangula

I successfully reproduced the issue and found a bug about the partition module and I will fix it, thanks so much:-).

But I found another two issues:

ISSUE-1: For the 'fc1' layer, the backend will output error: "[VAI_C-BACKEND][Check Failed: kernel_param * input_channel_group <= img_buf_depth]". The 'kernel_param', 'input_channel_group' and 'img_buf_depth' are all parameters used internally by the backend and they are tightly related to the DPU type you are using(see https://www.xilinx.com/support/documentation/ip_documentation/dpu/v3_2/pg338-dpu.pdf page20 more details about the parameter and constraints of our DPU.), in our case:

(_Note: bank depth and channel_parallel are DPU related, while kernel_h/kernel_w, inputchannel are model related.) In our case, img_buf_depth < 300 * 16 = 4800, which is too large for the DPU.

There are usually several workarouds to solve this issue:

ISSUE-2: For the 'Concat1' layer, the backend will fail to generate DPU instructions , the root cause is: For the DPU, only concatenation in channel axis is supported(In our case, the axis is 2 for 'Concat1'- concatenation in H axis); Although In some cases, some optimizations can be leveraged by the middle end to get rid of such constraint, while in our case the constraint still exists after passes of middle end, this is why the backend output an error.

Thanks very much.

JerrySciangula commented 4 years ago

Hi @Mookel, thank you for your exhaustive reply. I will try to resolve according your advices. Thank you again