Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.49k stars 633 forks source link

No response while running alveo samples from DPU #388

Closed FloyedShen closed 3 years ago

FloyedShen commented 3 years ago

Hello. I am trying to run some examples on alevo U200. I successfully programmed the FPGA and loaded the image tensor. But the program gets stuck and does not get any output. For example, when I try ResNet example in /vitis_ai_home/examples/DPUCADX8G/vitis_ai_alveo_samples/resnet50_mt_py, I got:

 (vitis-ai-tensorflow) Vitis-AI /vitis_ai_home/examples/DPUCADX8G/vitis_ai_alveo_samples/resnet50_mt_py > vi resnet50.py
(vitis-ai-tensorflow) Vitis-AI /vitis_ai_home/examples/DPUCADX8G/vitis_ai_alveo_samples/resnet50_mt_py > sudo vi resnet50.py
(vitis-ai-tensorflow) Vitis-AI /vitis_ai_home/examples/DPUCADX8G/vitis_ai_alveo_samples/resnet50_mt_py > python resnet50.py 1 ./model
-------------------
Speaking to Butler
Response from Butler is:
errCode: errCode: 0
errCode String: SUCCESS
myHandle: 2
valid: 1

[XDNN] loading xclbin settings from /opt/xilinx/overlaybins/xdnnv3/xdnn_v3_96x16_2pe_8b_9mb_bank03_2.xclbin.json
[XDNN] using custom DDR banks 0,3
Path ./model/weights.h5 is a file.
Loading weights/bias/quant_params to FPGA...

[XRT]    git hash                   : 7c93966ead2dec777b92bdc379893f22b5bd561e
[XDNN]   git hash                   : 108fbe330fd7c1682526ff720cd9accb6ff3f6c0
[XDNN] kernel configuration
[XDNN]   num cores                  : 2
[XDNN]   dsp array width            : 96
[XDNN]   axi data width (in 32bits) : 16
[XDNN]   img mem size               : 9 MB
[XDNN]   max instr num              : 1536
[XDNN]   max xbar entries           : 4096
[XDNN]   version                    : 3.2
[XDNN]   8-bit mode                 : 1

Then it got stuck. I checked the information in xbutler and it showed that the results have been returned.

$ sudo systemctl status xbutler
● xbutler.service - The Xilinx Butler
   Loaded: loaded (/etc/systemd/system/xbutler.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2021-04-21 15:04:55 UTC; 23min ago
  Process: 28885 ExecStop=/etc/init.d/xbutler stop (code=exited, status=0/SUCCESS)
 Main PID: 29688 (xbutler)
    Tasks: 5 (limit: 17203)
   CGroup: /system.slice/xbutler.service
           ├─29688 /bin/bash /etc/init.d/xbutler start
           └─29736 /usr/sbin/xbutler

Apr 21 15:07:22 yns191101 xbutler[29688]: AcquireFPGA: UDF result is valid.
Apr 21 15:07:22 yns191101 xbutler[29688]: AcquireFPGA: Programming FPGA...
Apr 21 15:07:26 yns191101 xbutler[29688]: PID 7987 acquire FPGA at index 0
Apr 21 15:07:26 yns191101 xbutler[29688]: AcquireFPGA: Sending response...
Apr 21 15:07:26 yns191101 xbutler[29688]: AcquireFPGA: Response is: SUCCESS
Apr 21 15:07:26 yns191101 xbutler[29688]: AcquireFPGA: Sending handle...
Apr 21 15:07:26 yns191101 xbutler[29688]: AcquireFPGA: Handle is: 2
Apr 21 15:07:26 yns191101 xbutler[29688]: AcquireFPGA: Sending result...
Apr 21 15:07:26 yns191101 xbutler[29688]: Finishing AcquireFPGA.
Apr 21 15:07:26 yns191101 xbutler[29688]: -------------------------------------

But when I try to view the information of the AXI bus through xbutil, I find that no information is sent back to the host.

$ sudo ./xbutil status --aim
INFO: Found total 2 card(s), 2 are usable
AXI Interface Monitor Counters
Region or CU     Type or Port      Write kBytes      Write Trans.      Read kBytes       Read Tranx.       Outstanding Cnt   Last Wr Addr      Last Wr Data      Last Rd Addr      Last Rd Data
shell            Memory to Memory  0.000             0                 0.000             0                 0                 0x0               0x0
      0x0               0x0
shell            Host to Device    567773.824        1108934           0.000             0                 0                 0x4010f82e00      0x0
      0x0               0x0
shell            Peer to Peer      0.000             0                 0.000             0                 0                 0x0               0x0
      0x0               0x0
INFO: xbutil status succeeded.

I checked the sample program provided and found that the program stopped at dpu.wait(job_id). What should I do to fix this problem? Thank you for your help.

sumitn-xilinx commented 3 years ago

Hello. can you try the following examples too?

https://github.com/Xilinx/Vitis-AI/tree/master/examples/DPUCADX8G/deployment_modes

FloyedShen commented 3 years ago

I ran into the same problem when running the program in https://github.com/Xilinx/Vitis-AI/tree/master/examples/DPUCADX8G/deployment_modes. By the way, I failed to install xbutler in docker, so I installed and run the xbutler service outside of docker, will this have an impact? The following is the information output by me installing xbuter in docker:

(vitis-ai-caffe) Vitis-AI /workspace/setup/alveo/u200_u250/packages/ubuntu > sudo apt install ./xbutler_3.0-1.deb
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'xbutler' instead of './xbutler_3.0-1.deb'
The following NEW packages will be installed:
  xbutler
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B/1,051 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 /workspace/setup/alveo/u200_u250/packages/ubuntu/xbutler_3.0-1.deb xbutler amd64 3.0-1 [1,051 kB]
debconf: delaying package configuration, since apt-utils is not installed
(Reading database ... 101145 files and directories currently installed.)
Preparing to unpack .../ubuntu/xbutler_3.0-1.deb ...
Unpacking xbutler (3.0-1) ...
dpkg: error processing archive /workspace/setup/alveo/u200_u250/packages/ubuntu/xbutler_3.0-1.deb (--unpack):
 unable to create '/etc/xbutler/xbutler.config.dpkg-new' (while processing './etc/xbutler/xbutler.config'): No such file or directory
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)
Errors were encountered while processing:
 /workspace/setup/alveo/u200_u250/packages/ubuntu/xbutler_3.0-1.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
FloyedShen commented 3 years ago

I ran the same test on another computer with U250, and found that the sample program can run correctly. I think this is caused by XRT not being installed correctly. Thank you for your help!