Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.44k stars 624 forks source link

Cannot run examples on Alveo U250 card #407

Open matejbizjak opened 3 years ago

matejbizjak commented 3 years ago

Hello, we have problems setting up the Alveo U250 card to work with the Vitis-AI.

We are using Centos 8 and because we would like to use the latest Vitis and Vivado software alongside Vitis-AI, we followed the installation instructions on Getting started page. Now we are using XRT version 2.8.743-1, xilinx-u250-gen3x16-xdma-platform-3.1-1.noarch platform and the latest xilinx/vitis-ai-cpu:latest Docker image.

We had problems with the xclbin files so we followed Github setup guide for u250 and manually run the remaining parts of the install.sh script. However, in the script there is a note "Ubuntu 20.04 and CentOS/RHEL 8 are not supported by this script".

Xbutler 4.0-0 seems to work properly, this is the output of the xbutler_backdoor.sh -b:

-------------------------------
Validating Arguments... passed!
-------------------------------
----------------------
Verifying CONDA_PREFIX
----------------------
----------------------
Verifying XILINX_XRT
----------------------
Pass!
Password Correct!
-----------------------
 Handle/Resource Pairs
-----------------------
-----------------------
  Service/Handle Pairs
-----------------------
-----------------------
       Port Info
-----------------------

source ${VAI_HOME}/setup/alveo/u200_u250/overlaybins/setup.sh prints the following:

------------------
Using VAI_HOME
------------------
/vitis_ai_home
---------------------
Verifying XILINX_XRM
---------------------
---------------------
Using LD_LIBRARY_PATH
---------------------
/opt/xilinx/xrt/lib:/usr/lib:/usr/lib/x86_64-linux-gnu:/usr/local/lib:/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib
--------------------
Vitis-AI Flow
---------------------
-------------------
Using LIBXDNN_PATH
-------------------
/opt/vitis_ai/conda/envs/vitis-ai-caffe/lib/libxfdnn.so
-------------------
PYTHONPATH
-------------------
---------------------
Verifying XILINX_XRT
---------------------
XILINX_XRT        : /opt/xilinx/xrt
PATH              : /opt/xilinx/xrt/bin:/opt/vitis_ai/conda/envs/vitis-ai-caffe/bin:/opt/vitis_ai/conda/bin:/opt/vitis_ai/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
LD_LIBRARY_PATH   : /opt/xilinx/xrt/lib:/opt/vitis_ai/conda/envs/vitis-ai-caffe/lib:/opt/xilinx/xrt/lib:/usr/lib:/usr/lib/x86_64-linux-gnu:/usr/local/lib:/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib
PYTHONPATH        : /opt/xilinx/xrt/python:

Whenever we try to run any of the examples for U250 in directiories Vitis-AI/examples/DPUCADX8G/ and Vitis-AI/demo/Vitis-AI-Library/ we get the following errors:

-------------------
Speaking to Butler 
Response from Butler is: 
errCode: errCode: 4
errCode String: ASSIGNMENT_UNFEASIBLE
myHandle: 0
valid: 1
-------------------
....

If we try to run an example for the DPUCADF8H we get the following error:

WARNING: Logging before InitGoogleLogging() is written to STDERR
I0507 03:06:51.017894   147 main.cc:293] create running for subgraph: subgraph_InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D
loading xclbin: /opt/xilinx/overlaybins/dpuv3int8
errCode: errCode: 22
errCode String: SERVICE_INVALID
myHandle: 0
valid: 1

We cannot get any help from the other issues / forum posts, because most of them were left unsolved:

Could you please confirm that Vitis-AI works on the U250 card with the latest XRT, Deployment Target Platform on Centos 8 and help us finishing the setup. And maybe update the install.sh script. By the way, we also have Alveo U280 cards and they work properly with Vitis-AI... and yes, we have set the XLNX_ENABLE_DEVICES variable to match the correct DPUs.

weberxzq commented 3 years ago

Hi @matejbizjak, I have very similar situation like yours. I also have an U250 card and U280 card, trying to walk through the U280 card now but failed. You said your U280 worked properly, is it on CentOS 8? Did you just strictly follow the demo example instruction here and it worked successfully? I followed the instruction here but has to manually install librt-engine-1.3.0-r130 inside docker to make it work while the default version for librt-engine is 1.3.0-r106.

matejbizjak commented 3 years ago

Hi @weberxzq, yes I have followed this example and it works fine on CentOS 8. Make sure you set _$XLNX_ENABLEDEVICES variable properly in the container if you use different Alveo cards. And dpu.xclbin and _hbm_addressassignment.txt files have to be in /usr/lib.

nupurec commented 3 years ago

Hi @matejbizjak I am trying to run the resnet50 example on U280 card CentOS 7, unsuccessfully as of now I am getting frequency mismatch error. See terminal and dmesg outputs below.

Terminal Output: I0531 05:46:28.115594 2612 main.cc:285] create running for subgraph: subgraph_ResNetResNet_AdaptiveAvgPool2d_avgpool1257_i0 XRT build version: 2.8.726 Build hash: 7c93966ead2dec777b92bdc379893f22b5bd561e Build date: 2020-11-11 20:29:19 Git branch: 2020.2 PID: 2612 UID: 0 HOST: sloth.perc.com EXE: /workspace/demo/VART/resnet50/resnet50 [XRT] ERROR: See dmesg log for details. err=-33 F0531 05:46:28.195168 2612 xrt_bin_stream.cpp:149] Check failed: xclLoadXclBin(handle, blob) == 0 (-33 vs. 0) Bitstream download failed ! Check failure stack trace: Aborted

dmesg output:
icap_ocl_update_clock_freq_topology: Unable to set frequency as requested frequency 32568 is greater than set by xclbin 300

Can you please help me with this. I couldnot locate the cmd to set frequency in the main.cc file. or is it soemthing else that I am missing. Please share your experience.

anilmartha commented 3 years ago

@nupurec

Did you follow the instructions from here. Did you copy corresponding alveo_xclbin-1.3.1/U280/14E300M/dpu.xclbin and hbm_address_assignment.txt to /usr/lib?

nupurec commented 3 years ago

Hi @anilmartha . I have copied the xclbin and hbm address files to /usr/lib. Followed all instructions from that page also.