Xilinx / Vitis-Tutorials

Vitis In-Depth Tutorials
https://Xilinx.github.io/Vitis-Tutorials/
MIT License
1.21k stars 553 forks source link

Vitis Custom Embedded Platform Creation Example on ZCU104 DPU Test 3: Run a Vitis-AI Demo not working #122

Open Ali-Flt opened 2 years ago

Ali-Flt commented 2 years ago

Hi, I've gone through this tutorial with Vitis 2020.2 and Vitis AI v1.3 : https://github.com/Xilinx/Vitis-Tutorials/tree/2020.2/Vitis_Platform_Creation/Introduction/02-Edge-AI-ZCU104

With some slight differences:

  1. My Vitis platform has a MIG IP core for interfacing with PL-DDR4 SODIMM of ZCU104 board.
  2. I added some extra packages to my rootfs in petalinux (such as opencv)
  3. Instead of adding the Vitis AI Library using the explained method in Test 3, I cloned the repo using the code below: git clone https://github.com/Xilinx/Vitis-AI.git git checkout v1.3

And added the repo to vitis like this: image

Every other step and instruction was followed without error.

But when I run the Vitis-AI demo on the bell pepper image, I get this for the first run:

root@zcu104_custom_plnx:~# env LD_LIBRARY_PATH=samples/lib XLNX_VART_FIRMWARE=/mnt/sd-mmcblk0p1/dpu.xclbin ./dpu_trd bellpeppe-994958.JPEG                                                                                                    

[  226.509859] [drm] Pid 1254 opened device                                                                                                                                                                                                   
[  226.513817] [drm] Pid 1254 closed device                                                                                                                                                                                                   
[  226.517845] [drm] Pid 1254 opened device                                                                                                                                                                                                   
[  226.521786] [drm] Pid 1254 closed device                                                                                                                                                                                                   
[  226.595670] [drm] Pid 1254 opened device                                                                                                                                                                                                   
[  226.599622] [drm] Pid 1254 closed device                                                                                                                                                                                                   
[  226.603564] [drm] Pid 1254 opened device                                                                                                                                                                                                   
[  226.607501] [drm] Pid 1254 closed device                                                                                                                                                                                                   
[  226.620442] [drm] Pid 1254 opened device                                        
[  226.624524] [drm] Pid 1254 closed device                                        
[  226.732663] [drm] Pid 1254 opened device                                                                                                                     
[  226.736601] [drm] Pid 1254 closed device                                              
[  226.740541] [drm] Pid 1254 opened device                                           
[  226.757298] [drm] get section DEBUG_IP_LAYOUT err: -22                                                         
[  226.757306] [drm] get section AIE_METADATA err: -22                             
[  226.762559] [drm] zocl_xclbin_read_axlf 1a5daa76-f818-40bb-af8a-c0bee51ee03b ret: 0            
[  226.771847] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=1
[  226.779523] [drm] No ERT scheduler on MPSoC, using KDS
[  226.791945] [drm] scheduler config ert(0)
[  226.791947] [drm]   cus(2)
[  226.795949] [drm]   slots(16)
[  226.798645] [drm]   num_cu_masks(1)
[  226.801612] [drm]   cu_shift(16)
[  226.805096] [drm]   cu_base(0xa0000000)
[  226.808309] [drm]   polling(0)
[  226.812174] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=0
[  226.815431] [drm] Pid 1254 opened device
[  226.826763] [drm] Pid 1254 closed device
[  226.830790] [drm] Pid 1254 opened device
[  226.834971] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=1
[  226.834992] [drm] Pid 1254 opened device
[  226.846316] [drm] Pid 1254 closed device
[  226.850262] [drm] Pid 1254 opened device
[  226.854305] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=2
[  226.854342] [drm] Pid 1254 opened device
[  226.865492] [drm] Pid 1254 closed device
[  226.869431] [drm] Pid 1254 opened device
[  226.873463] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=3
[  226.873968] [drm] Pid 1254 opened device
[  226.885109] [drm] Pid 1254 closed device
[  226.889045] [drm] Pid 1254 opened device
score[5]    =  0.00532136   text: electric ray, crampfish, numbfish, torpedo,
score[18]   =  0.00532136   text: magpie,
score[21[  226.893021] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=4
]   =  0.00532136   text: kite,
score[27]   =  0.00532136   tex[  228.466819] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=3
t: eft,
score[2]    =  0.00532136   text: great white shark, wh[  228.479460] [drm] Pid 1254 closed device
[  228.502080] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=2

[  228.502091] [drm] Pid 1254 closed device
[  228.705448] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=0
[  228.705454] [drm] Pid 1254 closed device
[  228.716795] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=0
[  228.716797] [drm] Pid 1254 closed device
[  228.728206] [drm] Pid 1254 closed device

Notice there are some errors that I have no idea about the reason:

[  226.757298] [drm] get section DEBUG_IP_LAYOUT err: -22                                                         
[  226.757306] [drm] get section AIE_METADATA err: -22                             
[  226.762559] [drm] zocl_xclbin_read_axlf 1a5daa76-f818-40bb-af8a-c0bee51ee03b ret: 0            
[  226.771847] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=1
[  226.779523] [drm] No ERT scheduler on MPSoC, using KDS
[  226.791945] [drm] scheduler config ert(0)
[  226.791947] [drm]   cus(2)
[  226.795949] [drm]   slots(16)
[  226.798645] [drm]   num_cu_masks(1)
[  226.801612] [drm]   cu_shift(16)
[  226.805096] [drm]   cu_base(0xa0000000)
[  226.808309] [drm]   polling(0)

And I get this results for the runs after the first one:

root@zcu104_custom_plnx:~# env LD_LIBRARY_PATH=samples/lib XLNX_VART_FIRMWARE=/mnt/sd-mmcblk0p1/dpu.xclbin ./dpu_trd b
ellpeppe-994958.JPEG
[ 1748.991559] [drm] Pid 1611 opened device
[ 1748.995524] [drm] Pid 1611 closed device
[ 1748.999547] [drm] Pid 1611 opened device
[ 1749.003469] [drm] Pid 1611 closed device
[ 1749.014716] [drm] Pid 1611 opened device
[ 1749.018667] [drm] Pid 1611 closed device
[ 1749.022609] [drm] Pid 1611 opened device
[ 1749.026530] [drm] Pid 1611 closed device
[ 1749.030646] [drm] Pid 1611 opened device
[ 1749.034660] [drm] Pid 1611 closed device
[ 1749.038987] [drm] Pid 1611 opened device
[ 1749.042912] [drm] Pid 1611 closed device
[ 1749.046862] [drm] Pid 1611 opened device
[ 1749.053860] [drm] zocl_xclbin_read_axlf The XCLBIN already loaded
[ 1749.053870] [drm] zocl_xclbin_read_axlf 1a5daa76-f818-40bb-af8a-c0bee51ee03b ret: 0
[ 1749.064277] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=1
[ 1749.071954] [drm] Reconfiguration not supported
[ 1749.083714] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=0
[ 1749.083851] [drm] Pid 1611 opened device
[ 1749.095173] [drm] Pid 1611 closed device
[ 1749.099195] [drm] Pid 1611 opened device
[ 1749.103379] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=1
[ 1749.103400] [drm] Pid 1611 opened device
[ 1749.114536] [drm] Pid 1611 closed device
[ 1749.118473] [drm] Pid 1611 opened device
[ 1749.122508] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=2
[ 1749.122541] [drm] Pid 1611 opened device
[ 1749.133675] [drm] Pid 1611 closed device
[ 1749.137612] [drm] Pid 1611 opened device
[ 1749.141596] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=3
[ 1749.142090] [drm] Pid 1611 opened device
[ 1749.153243] [drm] Pid 1611 closed device
[ 1749.157181] [drm] Pid 1611 opened device
score[5]    =  0.00522272   text: electric ray, crampfish, numbfish, torpedo,
score[4]    =  0.00522272   text: hammerhead, hamme[ 1749.161152] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=4
rhead shark,
score[18]   =  0.00522272   text: magpie,
score[2[ 1750.660487] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=3
0]   =  0.00522272   text: water ouzel, dipper,
score[2]    =  [ 1750.673183] [drm] Pid 1611 closed device
0.00522272   text: great white shark, white shark, man-eater, ma[ 1750.695665] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=2
n-eating shark, Carcharodon carcharias,
[ 1750.695675] [drm] Pid 1611 closed device
[ 1750.877305] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=0
[ 1750.877311] [drm] Pid 1611 closed device
[ 1750.888650] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=0
[ 1750.888653] [drm] Pid 1611 closed device
[ 1750.900065] [drm] Pid 1611 closed device

As you can see the app runs without errors but the predictions are not correct at all. Any ideas why this could happen or how I can debug this?

Thanks

imrickysu commented 2 years ago

The AIE_METADATA err: -22 can be safely ignored. It's an XRT code issue. The wrong result can be caused by many reasons. Checking with Vitis-AI experts.

Ali-Flt commented 2 years ago

I went through all 4 steps of the tutorial again without changing anything. The only change was that I used Vitis AI branch 1.3.2 instead of V1.3 and still the results are incorrect.

This is getting really frustrating for me because I'm doing exactly what the tutorial tells me to do, I don't even get any errors but the model is not giving proper outputs.

Ali-Flt commented 2 years ago

I also installed Vitis 2021.2 and tried the whole flow with Vitis AI 1.4 (master). But the results were the same.

I forgot to mention that instead of loading the SD Card using the sd_card.img file, I extracted the zcu104_custom_plnx/images/linux/rootfs.tar.gz file in the second partition and copied all the files in the dpu_trd_system/Hardware/package/sd_card/ folder into the first SD card partition (boot partition). I assume this shouldn't change the application result. right?

I'm suspecting that because I have downloaded the vitis ai git separately and added its path to Vitis instead of letting Vitis download it itself the dpu trd application template files are not loaded properly. because I've done every other thing exactly like the tutorial. Could this be the cause of the issue? Here are list of the warnings I get after Vitis application project has been built successfully: image

Also here is the error I get when I try to download the Vitis AI library in the Vitis IDE: image

I've tried downloading other gits as libraries in Vitis IDE without error but there seem to be a problem with Vitis AI Git. Please tell me how to fix this error so that I can see if the issue is caused by the manual git download.

OS : Ubuntu 18.04 LTS Vitis 2021.2 Vitis AI 1.4 (master)

Ali-Flt commented 2 years ago

Also I found out another problem. When I create the DPU Kernel application project using the Xilinx Official Vitis platform for ZCU104, the system builds without errors and resnet model works too. There are some differences in the application project when I create it on top of the platform provided by xilinx and when I create it on top of my own custom platform. For example in the section below, the hw_link configurations are loaded automatically in the first case but not in the second case. image

Why are such configurations not loaded in the application project on my custom platform? Can you tell me where the script for generating the application project files is located? and what could cause some files not to be loaded?

imrickysu commented 2 years ago

The v++ configuration settings is set by https://github.com/Xilinx/Vitis-AI/blob/v1.3/dsa/DPU-TRD/prj/Vitis/config_file/prj_config_gui and this file is associated to the application project by https://github.com/Xilinx/Vitis-AI/blob/v1.3/dsa/DPU-TRD/description.json

In description.json, the "ldclflags" : "--config PROJECT/src/prj/Vitis/config_file/prj_config_gui" is set under platform_properties-> zcu104_base. To workaround this issue, you can do any of the following

If you update the description.json, it can be something like this:

"containers": [
        {
            "accelerators": [
                {
                    "kernel_type": "user", 
                    "name": "DPUCZDX8G",
                    "num_compute_units" : "2",
                    "build_command" : "$(VIVADO) -mode batch -source PROJECT/src/prj/Vitis/scripts_gui/gen_dpu_xo.tcl -tclargs $(PROJECT) $@ $(KERNEL_NAME) $(TARGET) $(DEVICE) $(XSA)",
                    "clean_command" : "rm -rf *.log *.jou *.xo packaged_* tmp_kernel_*",
                    "dependencies" : [
                        "src/prj/Vitis/kernel_xml/dpu/kernel.xml",
                    "src/prj/Vitis/scripts_gui/package_dpu_kernel.tcl",
                    "src/prj/Vitis/scripts_gui/gen_dpu_xo.tcl",
                    "src/prj/Vitis/dpu_conf.vh",
                    "src/dpu_ip/Vitis/dpu/hdl/DPUCZDX8G.v",
                    "src/dpu_ip/Vitis/dpu/inc/arch_def.vh",
                    "src/dpu_ip/Vitis/dpu/xdc/timing_clocks.xdc",
                    "src/dpu_ip/DPUCZDX8G_v3_3_0/ttcl/fingerprint_json.ttcl",
                    "src/dpu_ip/DPUCZDX8G_v3_3_0/hdl/DPUCZDX8G_v3_3_0_vl_dpu.sv",
                    "src/dpu_ip/DPUCZDX8G_v3_3_0/inc/function.vh",
                        "src/dpu_ip/DPUCZDX8G_v3_3_0/inc/arch_para.vh"
                    ]
                },
                {
                    "kernel_type": "user", 
                    "name": "sfm_xrt_top",
                    "build_command" : "$(VIVADO) -mode batch -source PROJECT/src/prj/Vitis/scripts_gui/gen_sfm_xo.tcl -tclargs $(PROJECT) $@ $(KERNEL_NAME) $(TARGET) $(DEVICE) $(XSA)",
                    "dependencies" : [
                        "src/prj/Vitis/kernel_xml/sfm/kernel.xml",
                    "src/prj/Vitis/scripts_gui/package_sfm_kernel.tcl",
                    "src/prj/Vitis/scripts_gui/gen_sfm_xo.tcl",
                    "src/dpu_ip/Vitis/sfm/hdl/sfm_xrt_top.v",
                    "src/dpu_ip/DPUCZDX8G_v3_3_0/hdl/DPUCZDX8G_v3_3_0_vl_sfm.sv",
                    "src/dpu_ip/DPUCZDX8G_v3_3_0/xci/sfm/fp_acc/fp_acc.xci",
                    "src/dpu_ip/DPUCZDX8G_v3_3_0/xci/sfm/fp_add/fp_add.xci",
                    "src/dpu_ip/DPUCZDX8G_v3_3_0/xci/sfm/fp_convert/fp_convert.xci",
                    "src/dpu_ip/DPUCZDX8G_v3_3_0/xci/sfm/fp_div/fp_div.xci",
                    "src/dpu_ip/DPUCZDX8G_v3_3_0/xci/sfm/fp_exp/fp_exp.xci"
                    ]
                }
            ], 
            "name": "dpu",
            "ldclflags" : "--config PROJECT/src/prj/Vitis/config_file/prj_config_gui"
        }
    ], 

I have reported this issue before but the fix hasn't been applied yet. Sorry for this gap in the tutorial.

Ali-Flt commented 2 years ago

Hi @imrickysu,

Thanks for the answer, I really appreciate your quick answers to my comments. Yes today after I posted the last comment, I searched in the files and found the thing you mentioned in the .json file. And I was really shocked of the fact that the project's behavior depends on your platform's name. Please at least mention this in the tutorial.

But even with having zcu104_base in the platform's name, the resnet from the application project on my custom platform is not working.

root@zcu104_custom_plnx:~# env LD_LIBRARY_PATH=samples/lib XLNX_VART_FIRMWARE=/media/sd-mmcblk0p1/dpu.xclbin ./dpu bellpeppe-994958.JPEG
score[37]   =  0.0396331    text: box turtle, box tortoise,
score[117]  =  0.0396331    text: chambered nautilus, pearly nautilus, nautilus,
score[121]  =  0.0396331    text: king crab, Alaska crab, Alaskan king crab, Alaska king crab, Paralithodes camtschatica,
score[149]  =  0.0396331    text: dugong, Dugong dugon,
score[85]   =  0.0396331    text: quail,

I'm really frustrated at this moment after running this tutorial on many different conditions several times, so please tell me if you have any idea why the model may not work. Or a way to debug the DPU cores' behavior. should I change anything in the petalinux project/vitis? The worst part is that the Vitis project builds without any errors, giving no clues about where the issue may be.

Obviously this tutorial has not been tested on the current version of Vitis and Vitis AI, so please test it, find the issues and update the tutorial.

imrickysu commented 2 years ago

Hi @Ali-Flt , I reran the VAI test for 2020.2. It worked well on my side.

Could you try to create the Vitis-AI application with the platform generated by the Makefile ? You can run make all and generate the platform.

For the configuration setting issue we discussed above, the tutorial Step 5 (Update system_hw_link for proper kernel instantiation) considered this issue and provided the method to overcome the descrption.json setting specific the platform name.

image

image

Ali-Flt commented 2 years ago

Hi @imrickysu , Thanks for going through the tutorial for verification. I used the Makefile as you explained on Vitis 2021.2 to run all steps and the resnet app is working successfully now: image

So after that I went to the make scripts in each step and looked for differences with the tutorial. Please read the differences I found and update the tutorial, because one of them is probably the cause of the platform not working. Step 1:

I couldn't find any other differences but I may have missed something. I also didn't check the PS's configurations for any mismatch.

Step 2:

line 42: echo 'CONFIG_YOCTO_MACHINE_NAME="zcu104-zynqmp"' >> $(PETALINUX_CONFIG)
line 44: echo "CONFIG_YOCTO_BUILDTOOLS_EXTENDED=y" >> $(PETALINUX_CONFIG)
line 76: cd $(PETALINUX_DIR) && petalinux-package --boot --u-boot
line 80: cd $(PETALINUX_DIR) && petalinux-package --sysroot

(Note that the rootfs configs were different too but I believe the problem is not hidden in the rootfs because I ran the test with my own generated rootfs without issues so I didn't mention the rootfs differences.)

Step 3: The platform is generated with this script so I don't exactly know the differences with the GUI Flow, but I think the main one is that the domain name in the script is set to "xrt" but in the GUI flow it is "linux on psu_cortexa53".

I did the last step (running the Vitis AI demo) exactly like before in the GUI so either the error lies in the things I mentioned above, or the behavior of VIVADO/Vitis GUI flow is not as expected and is not the same as the VIVADO/xsct script flow.

Thanks again for solving the issue for me by your suggestion and I hope this info helps in finding and fixing the issue in the tutorial.