Xilinx / Vitis-Tutorials

Vitis In-Depth Tutorials
https://Xilinx.github.io/Vitis-Tutorials/
MIT License
1.23k stars 553 forks source link

Issues building kernels - results make no sense #71

Closed vmayoral closed 3 years ago

vmayoral commented 3 years ago

I was doing some tests with more complex kernels and XRT but things started failing. After debugging things for a few days I realized that I couln't eve reproduce the simplest examples (e.g. vadd) so I came back to https://github.com/Xilinx/Vitis-Tutorials/blob/master/Getting_Started/Vitis/ (that's how I bumpd into #69 and #70). As of now, I'm unable to reproduce vadd simple Getting Started Vitis tutorial. Kernel's result makes no sense to me so I though I'd share with those of you much more experienced:

./app.exe vadd.xclbin
[  183.410540] [drm] Pid 1191 opened device
[  183.414484] [drm] Pid 1191 closed device
INFO: Found Xilinx Platform
[  183.443262] [drm] Pid 1191 opened device
[  183.449404] [drm] Pid 1191 closed device
[  183.453381] [drm] Pid 1191 opened device
INFO: Loading 'vadd.xclbin'
[  184.938156] [drm] get section AIE_METADATA err: -22
[  184.938189] [drm] zocl_xclbin_read_axlf 1254ea14-6c9a-0ebc-fddd-89abec916d44 ret: 0
[  184.945489] [drm] bitstream 1254ea14-6c9a-0ebc-fddd-89abec916d44 locked, ref=1
[  184.953169] [drm] No ERT scheduler on MPSoC, using KDS
[  184.965519] [drm] 8 non-zero interrupt-id CUs out of 9 CUs
[  184.965568] [drm] scheduler config ert(0)
[  184.971044] [drm]   cus(1)
[  184.975050] [drm]   slots(16)
[  184.977745] [drm]   num_cu_masks(1)
[  184.980704] [drm]   cu_shift(16)
[  184.984183] [drm]   cu_base(0x80000000)
[  184.987404] [drm]   polling(0)
[  184.991260] [drm] bitstream 1254ea14-6c9a-0ebc-fddd-89abec916d44 unlocked, ref=0
Error: Result mismatch
i = 16 CPU result = 6296 Device result = 0
TEST FAILED
[  184.995599] [drm] bitstream 1254ea14-6c9a-0ebc-fddd-89abec916d44 locked, ref=1
[  185.025175] [drm] bitstream 1254ea14-6c9a-0ebc-fddd-89abec916d44 unlocked, ref=0

I reproduced my setup a few times (which took me a few hours) to ensure I wasn't just distracted. Has anyone bumped into something similar? As of my last tests, I tried modifying the kernel's simple source code for returning a fixed constant or other sums but I'm still getting the same result, 0.

My setup:

setup XRT ```bash cd /home/erle/Desktop/Xilinx; git clone https://github.com/Xilinx/XRT cd XRT; sudo src/runtime_src/tools/scripts/xrtdeps.sh # install dependencies source /tools/Xilinx/Vitis/2020.2/settings64.sh # necessary for ERT export PATH="/usr/bin":$PATH # FIXME: adjust path for CMake 3.5+ cd build; ./build.sh cd Release; sudo apt-get install ./xrt_*-amd64-xrt.deb ```
Fetch Vitis-Tutorials ```bash cd ~; git clone https://github.com/Xilinx/Vitis-Tutorials ```
set environment Requires to first fetch rootfs and sysroots from [here](https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/embedded-platforms.html), while putting it into a coherent folder-structure. Then: ```bash cd ~/Vitis-Tutorials/Getting_Started/Vitis/example/zcu102/hw/ source /tools/Xilinx/Vitis/2020.2/settings64.sh source /opt/xilinx/xrt/setup.sh unset LD_LIBRARY_PATH export PLATFORM_REPO_PATHS="/home/erle/Desktop/Xilinx/xilinx_zcu102_base_202020_1" export ROOTFS=/home/erle/Desktop/Xilinx/rootfs source /home/erle/Desktop/Xilinx/rootfs/ir/environment-setup-aarch64-xilinx-linux ```
build the example ```bash ${CXX} -Wall -g -std=c++11 ../../src/host.cpp -o app.exe -I/usr/include/xrt -lOpenCL -lpthread -lrt -lstdc++ v++ -c -t hw --config ../../src/zcu102.cfg -k vadd -I../../src ../../src/vadd.cpp -o vadd.xo v++ -l -t hw --config ../../src/zcu102.cfg ./vadd.xo -o vadd.xclbin v++ -p -t hw --config ../../src/zcu102.cfg ./vadd.xclbin --package.out_dir package --package.rootfs ${ROOTFS}/rootfs.ext4 --package.sd_file ${ROOTFS}/Image --package.sd_file xrt.ini --package.sd_file app.exe --package.sd_file vadd.xclbin --package.sd_file run_app.sh ```
vmayoral commented 3 years ago

I reproduced the whole setup once again in a new computer installing everything (including Vitis, XRT, ZCU102 artifacts, etc.) from the beginning.

I got the same result, kernel returning 0.

```bash root@zynqmp-common-2020_2:/media/sd-mmcblk0p1# ./app.exe vadd.xclbin [ 132.876781] [drm] Pid 1186 opened device [ 132.880719] [drm] Pid 1186 closed device INFO: Found Xilinx Platform [ 132.909434] [drm] Pid 1186 opened device [ 132.915587] [drm] Pid 1186 closed device [ 132.919568] [drm] Pid 1186 opened device INFO: Loading 'vadd.xclbin' [ 134.374461] [drm] get section AIE_METADATA err: -22 [ 134.374491] [drm] zocl_xclbin_read_axlf 9ed99ac9-4029-2d83-f506-c51ae46b02fc ret: 0 [ 134.381786] [drm] bitstream 9ed99ac9-4029-2d83-f506-c51ae46b02fc locked, ref=1 [ 134.389540] [drm] No ERT scheduler on MPSoC, using KDS [ 134.401890] [drm] 8 non-zero interrupt-id CUs out of 9 CUs [ 134.401940] [drm] scheduler config ert(0) [ 134.407424] [drm] cus(1) [ 134.411421] [drm] slots(16) [ 134.414115] [drm] num_cu_masks(1) [ 134.417071] [drm] cu_shift(16) [ 134.420552] [drm] cu_base(0x80000000) [ 134.423771] [drm] polling(0) [ 134.427626] [drm] bitstream 9ed99ac9-4029-2d83-f506-c51ae46b02fc unlocked, ref=0 Error: Result mismatch i = 0 CPU result = 2349 Device result = 0 TEST FAILED [ 134.431935] [drm] bitstream 9ed99ac9-4029-2d83-f506-c51ae46b02fc locked, ref=1 [ 134.461650] [drm] bitstream 9ed99ac9-4029-2d83-f506-c51ae46b02fc unlocked, ref=0 [ 134.487089] [drm] Pid 1186 closed device root@zynqmp-common-2020_2:/media/sd-mmcblk0p1# root@zynqmp-common-2020_2:/media/sd-mmcblk0p1# export export EDITOR="vi" export HOME="/home/root" export HUSHLOGIN="FALSE" export LOGNAME="root" export MAIL="/var/spool/mail/root" export OLDPWD="/home/root" export OPIEDIR export PATH="/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin" export PS1="\\u@\\h:\\w\\\$ " export PWD="/media/sd-mmcblk0p1" export QPEDIR export QTDIR export SHELL="/bin/sh" export SHLVL="1" export TERM="vt102" export USER="root" export XILINX_XRT="/usr" ```
randyh62 commented 3 years ago

It is not clear what OS you are using so that may be the reason you are taking this approach, but I would suggest you could eliminate the custom XRT build you are using and just use the standard embedded platform to see if that helps? Everything else looks pretty standard.

vmayoral commented 3 years ago

Hello @randyh62, thanks for jumping in :)

It is not clear what OS

Ubuntu 20.04 in my workstation.

you could eliminate the custom XRT build you are using and just use the standard embedded platform to see if that helps?

What do you mean by this? the setup XRT step described above? Well, that's to meet this part of the instructions while building the kernels:

source <XRT_install_path>/setup.sh

My guess is that that's required for the cross-compilation, right? (so that header are found, etc). If it's a runtime, my understanding is that it's only meant to run in embedded indeed but I assumed headers had to still be found. Do you mean not doing this step?

randyh62 commented 3 years ago

Yes, I mean don't do the xrt setup the way you are doing it. The XRT Installation instructions are located at this link: https://www.xilinx.com/html_docs/xilinx2020_2/vitis_doc/acceleration_installation.html#pjr1542153622642 It includes the following Note: Installing XRT is not required when targeting Arm®-based embedded platforms: Vitis compiler has its own copy of xclbinutil for hardware generation, and for software compilation, you can use the XRT from the sysroot. Look for Common images for Embedded Vitis platforms on the downloads page.

If you go to the XRT download page: https://www.xilinx.com/products/design-tools/vitis/xrt.html#gettingstarted It includes a link to download XRT for embedded systems: https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/embedded-platforms/2020-2.html This is just the Embedded Platforms downloads page.

I am not saying it is the source of your trouble, but it could be a good place to start.

vmayoral commented 3 years ago

@randyh62 appreciate the recommendation, pointer and clarifications, really!

Installing XRT is not required when targeting Arm®-based embedded platforms: Vitis compiler has its own copy of xclbinutil for hardware generation, and for software compilation, you can use the XRT from the sysroot. Look for Common images for Embedded Vitis platforms on the downloads page.

I believe that many will encounter the same confusion (and will do as I did) so I really would encourage us to make small warning session for the embedded targets within the tutorials highlighting this.

I went ahead and did as recommended. Same result, unfortunately. It wasn't indeed the source of my trouble, but still worth trying 👍.

I then went ahead and while trying to isolate the problem decided to switch boards (I better discard early if it's my hardware). I grabed the ZCU104, built PetaLinux, etc and gave it a try:

```bash root@xilinx-zcu104-2020_2:/media/sd-mmcblk0p1# ./app.exe vadd.xclbin [ 88.967909] [drm] Pid 1197 opened device [ 88.971849] [drm] Pid 1197 closed device INFO: Found Xilinx Platform [ 89.002901] [drm] Pid 1197 opened device [ 89.009039] [drm] Pid 1197 closed device [ 89.013083] [drm] Pid 1197 opened device INFO: Loading 'vadd.xclbin' [ 90.107672] [drm] get section AIE_METADATA err: -22 [ 90.107699] [drm] zocl_xclbin_read_axlf d6e68a90-4609-4e97-a074-c7a12ea6638a ret: 0 [ 90.115130] [drm] bitstream d6e68a90-4609-4e97-a074-c7a12ea6638a locked, ref=1 [ 90.122823] [drm] No ERT scheduler on MPSoC, using KDS [ 90.135178] [drm] 8 non-zero interrupt-id CUs out of 9 CUs [ 90.135231] [drm] scheduler config ert(0) [ 90.140709] [drm] cus(1) [ 90.144708] [drm] slots(16) [ 90.147407] [drm] num_cu_masks(1) [ 90.150361] [drm] cu_shift(16) [ 90.153839] [drm] cu_base(0x80000000) [ 90.157058] [drm] polling(0) [ 90.160938] [drm] bitstream d6e68a90-4609-4e97-a074-c7a12ea6638a unlocked, ref=0 Error: Result mismatch i = 32 CPU result = 6233 Device result = 0 TEST FAILED [ 90.165205] [drm] bitstream d6e68a90-4609-4e97-a074-c7a12ea6638a locked, ref=1 [ 90.198423] [drm] bitstream d6e68a90-4609-4e97-a074-c7a12ea6638a unlocked, ref=0 [ 90.234256] [drm] Pid 1197 closed device root@xilinx-zcu104-2020_2:/media/sd-mmcblk0p1# ./app.exe vadd.xclbin [ 94.953431] [drm] Pid 1205 opened device [ 94.957387] [drm] Pid 1205 closed device INFO: Found Xilinx Platform [ 94.963889] [drm] Pid 1205 opened device [ 94.970063] [drm] Pid 1205 closed device [ 94.974024] [drm] Pid 1205 opened device INFO: Loading 'vadd.xclbin' [ 95.129360] [drm] zocl_xclbin_read_axlf The XCLBIN already loaded [ 95.129376] [drm] zocl_xclbin_read_axlf d6e68a90-4609-4e97-a074-c7a12ea6638a ret: 0 [ 95.137780] [drm] bitstream d6e68a90-4609-4e97-a074-c7a12ea6638a locked, ref=1 [ 95.145477] [drm] Reconfiguration not supported [ 95.157240] [drm] bitstream d6e68a90-4609-4e97-a074-c7a12ea6638a unlocked, ref=0 Error: Result mismatch i = 48 CPU result = 4380 Device result = 0 TEST FAILED [ 95.158331] [drm] bitstream d6e68a90-4609-4e97-a074-c7a12ea6638a locked, ref=1 [ 95.182159] [drm] bitstream d6e68a90-4609-4e97-a074-c7a12ea6638a unlocked, ref=0 [ 95.209999] [drm] Pid 1205 closed device ```

Same result :(.

vmayoral commented 3 years ago

@randyh62 and @rwarmstr, https://github.com/Xilinx/Vitis-Tutorials/pull/72 fix this.

FYI, this was previously reported at https://forums.xilinx.com/t5/Vitis-Acceleration-SDAccel-SDSoC/Cannot-run-quot-Vitis-Getting-Started-Tutorial-quot-on-ZCU104/m-p/1234868 but apparently nobody followed up.

vmayoral commented 3 years ago

Let's close this ticket if you agree @randyh62.

randyh62 commented 3 years ago

Thanks for your investigation. It looks like you found a problem we should fix. Sorry about the difficulty. Let us know if you run into any other issues.

vmayoral commented 3 years ago

No problem!

There's something else where I could use your help to debug things and get the expectedly behavior straight. It’s a bit late so I’ll follow up tomorrow with more in another ticket (since it’s unrelated to this).