Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.42k stars 621 forks source link

xdputil Fails Checking DPU Status #1400

Open danielstumpp opened 5 months ago

danielstumpp commented 5 months ago

Hello, I'm expanding the WAA example with pre- and post- processing shown here for my own application. So far these are the relevant changes I have made:

When I load the generated image and run the tests, both the pre- and post-processing accelerators pass the test and operate as expected. The end-to-end accelerator however fails with the following error:

F0125 17:51:24.579839  1436 xrt_device_handle_imp.cpp:327] Check failed: r == 0 cannot set read range! cu_index 2 cu_base_addr 2147680256 fingerprint 0x0 : Invalid argument [22]
*** Check failure stack trace: ***

I also get the same error when running xdputil status:

F0125 17:51:24.579839  1436 xrt_device_handle_imp.cpp:327] Check failed: r == 0 cannot set read range! cu_index 2 cu_base_addr 2147680256 fingerprint 0x0 : Invalid argument [22]
*** Check failure stack trace: ***
/usr/bin/xdputil: line 20:  1436 Aborted                 /usr/bin/python3 -m xdputil $*

The fact that it is seemingly checking CU index 2 is concerning, but I'm not sure why it's doing that or what the actual root cause of this error may be. Does anyone have any ideas? Any help would be much appreciated.

AlbertaBeef commented 5 months ago

My guess would be that the run-time is referring to a stale (previous) version of the dpu.xclbin file. Did you change the .xclbin file on your embedded platform ? I think the default search path for this file is defined in : /etc/vart.conf Make sure the .xclbin file defined in this file is pointing to your .xclbin

danielstumpp commented 5 months ago

Hello @AlbertaBeef, thanks for the response. I have confirmed that etc/vart.conf is pointing to the correct dpu.xclbin file. In this case located at /run/media/mmcblk0p1/dpu.xclbin.

I also confirmed that this dpu binary has each of the three kernels I expect (see output below). Any other idea what may be causing this issue?

partial output of xclbinutil --info --input dpu.xclbin:

root@zynqmp-common-20222:/run/media/mmcblk0p1# xclbinutil --info --input dpu.xclbin 
XRT Build Version: 2.14.0 (2022.2)
       Build Date: 2022-10-07 05:12:02
          Hash ID: 43926231f7183688add2dccfd391b36a1f000bea
------------------------------------------------------------------------------
Warning: The option '--output' has not been specified. All operations will    
         be done in memory with the exception of the '--dump-section' command.
------------------------------------------------------------------------------
Reading xclbin file into memory.  File: dpu.xclbin

==============================================================================
XRT Build Version: 2.14.0 (2022.2)
       Build Date: 2022-10-07 05:12:02
          Hash ID: 43926231f7183688add2dccfd391b36a1f000bea
==============================================================================
xclbin Information
------------------
   Generated by:           v++ (2022.2) on 2022-10-13-17:52:11
   Version:                2.14.0
   Kernels:                hls_dpupostproc_m_axi, hls_dpupreproc_m_axi, sfm_xrt_top, DPUCZDX8G
   Signature:              
   Content:                Bitstream
   UUID (xclbin):          a654ee8b-cbef-0ad8-8da0-641b68c74203
   Sections:               BITSTREAM, MEM_TOPOLOGY, IP_LAYOUT, CONNECTIVITY, 
                           BUILD_METADATA, EMBEDDED_METADATA, SYSTEM_METADATA, 
                           GROUP_CONNECTIVITY, GROUP_TOPOLOGY
==============================================================================
Hardware Platform (Shell) Information
-------------------------------------
   Vendor:                 xilinx.com
   Board:                  xd
   Name:                   xilinx_zcu104_base_202220_1
   Version:                202220.1
   Generated Version:      Vivado 2022.2 (SW Build: 3668458)
   Created:
               Tue Oct 11 16:36:36 2022   FPGA Device:            xczu7ev
   Board Vendor:           xilinx.com
   Board Name:             xilinx.com:zcu104:1.1
   Board Part:             xilinx.com:zcu104:part0:1.1
   Platform VBNV:          xilinx.com_xd_xilinx_zcu104_base_202220_1_202220_1
   Static UUID:            00000000-0000-0000-0000-000000000000
   Feature ROM TimeStamp:  0

Scalable Clocks
---------------
   No scalable clock data available.

System Clocks
------
   Name:           clk_wiz_0_clk_out1 
   Type:           FIXED 
   Default Freq:   150 MHz

   Name:           clk_wiz_0_clk_out2 
   Type:           FIXED 
   Default Freq:   300 MHz

   Name:           clk_wiz_0_clk_out3 
   Type:           FIXED 
   Default Freq:   75 MHz

   Name:           clk_wiz_0_clk_out4 
   Type:           FIXED 
   Default Freq:   100 MHz

   Name:           clk_wiz_0_clk_out5 
   Type:           FIXED 
   Default Freq:   200 MHz

   Name:           clk_wiz_0_clk_out6 
   Type:           FIXED 
   Default Freq:   400 MHz

   Name:           clk_wiz_0_clk_out7 
   Type:           FIXED 
   Default Freq:   600 MHz

Memory Configuration
--------------------
   Name:         HPC0
   Index:        0
   Type:         MEM_DRAM
   Base Address: 0x0
   Address Size: 0x80000000
   Bank Used:    Yes

   Name:         LPD
   Index:        1
   Type:         MEM_DRAM
   Base Address: 0x0
   Address Size: 0x80000000
   Bank Used:    No

   Name:         HP3
   Index:        2
   Type:         MEM_DRAM
   Base Address: 0x0
   Address Size: 0x80000000
   Bank Used:    No

   Name:         HPC1
   Index:        3
   Type:         MEM_DRAM
   Base Address: 0x0
   Address Size: 0x0
   Bank Used:    No

   Name:         HP0
   Index:        4
   Type:         MEM_DRAM
   Base Address: 0x0
   Address Size: 0x80000000
   Bank Used:    Yes

   Name:         HP1
   Index:        5
   Type:         MEM_DRAM
   Base Address: 0x0
   Address Size: 0x80000000
   Bank Used:    Yes

   Name:         HP2
   Index:        6
   Type:         MEM_DRAM
   Base Address: 0x0
   Address Size: 0x0
   Bank Used:    No
==============================================================================
Kernel: hls_dpupostproc_m_axi

Definition
----------
   Signature: hls_dpupostproc_m_axi (void* inp_data, void* out_max, void* out_index, unsigned int dpu_fixpos, unsigned int height, unsigned int width)

Ports
-----
   Port:          M_AXI_GMEM_IN
   Mode:          master
   Range (bytes): 0xFFFFFFFF
   Data Width:    128 bits
   Port Type:     addressable

   Port:          M_AXI_GMEM_OUT_MAX
   Mode:          master
   Range (bytes): 0xFFFFFFFF
   Data Width:    8 bits
   Port Type:     addressable

   Port:          M_AXI_GMEM_OUT_INDEX
   Mode:          master
   Range (bytes): 0xFFFFFFFF
   Data Width:    8 bits
   Port Type:     addressable

   Port:          S_AXI_CONTROL
   Mode:          slave
   Range (bytes): 0x4C
   Data Width:    32 bits
   Port Type:     addressable

--------------------------
Instance:        hls_dpupostproc_m_axi
   Base Address: 0x80030000

   Argument:          inp_data
   Register Offset:   0x10
   Port:              M_AXI_GMEM_IN
   Memory:            HPC0 (MEM_DRAM)

   Argument:          out_max
   Register Offset:   0x1C
   Port:              M_AXI_GMEM_OUT_MAX
   Memory:            HP0 (MEM_DRAM)

   Argument:          out_index
   Register Offset:   0x28
   Port:              M_AXI_GMEM_OUT_INDEX
   Memory:            HP1 (MEM_DRAM)

   Argument:          dpu_fixpos
   Register Offset:   0x34
   Port:              S_AXI_CONTROL
   Memory:            <not applicable>

   Argument:          height
   Register Offset:   0x3C
   Port:              S_AXI_CONTROL
   Memory:            <not applicable>

   Argument:          width
   Register Offset:   0x44
   Port:              S_AXI_CONTROL
   Memory:            <not applicable>
Kernel: hls_dpupreproc_m_axi

Definition
----------
   Signature: hls_dpupreproc_m_axi (void* img_inp, void* img_out, float means_0, float means_1, float means_2, float scales_0, float scales_1, float scales_2, unsigned int dpu_fixpos, unsigned int height, unsigned int width)

Ports
-----
   Port:          M_AXI_GMEM_IN
   Mode:          master
   Range (bytes): 0xFFFFFFFF
   Data Width:    32 bits
   Port Type:     addressable

   Port:          M_AXI_GMEM_OUT
   Mode:          master
   Range (bytes): 0xFFFFFFFF
   Data Width:    32 bits
   Port Type:     addressable

   Port:          S_AXI_CONTROL
   Mode:          slave
   Range (bytes): 0x70
   Data Width:    32 bits
   Port Type:     addressable

--------------------------
Instance:        hls_dpupreproc_m_axi
   Base Address: 0x80040000

   Argument:          img_inp
   Register Offset:   0x10
   Port:              M_AXI_GMEM_IN
   Memory:            HP0 (MEM_DRAM)

   Argument:          img_out
   Register Offset:   0x1C
   Port:              M_AXI_GMEM_OUT
   Memory:            HP1 (MEM_DRAM)

   Argument:          means_0
   Register Offset:   0x28
   Port:              S_AXI_CONTROL
   Memory:            <not applicable>

   Argument:          means_1
   Register Offset:   0x30
   Port:              S_AXI_CONTROL
   Memory:            <not applicable>

   Argument:          means_2
   Register Offset:   0x38
   Port:              S_AXI_CONTROL
   Memory:            <not applicable>

   Argument:          scales_0
   Register Offset:   0x40
   Port:              S_AXI_CONTROL
   Memory:            <not applicable>

   Argument:          scales_1
   Register Offset:   0x48
   Port:              S_AXI_CONTROL
   Memory:            <not applicable>

   Argument:          scales_2
   Register Offset:   0x50
   Port:              S_AXI_CONTROL
   Memory:            <not applicable>

   Argument:          dpu_fixpos
   Register Offset:   0x58
   Port:              S_AXI_CONTROL
   Memory:            <not applicable>

   Argument:          height
   Register Offset:   0x60
   Port:              S_AXI_CONTROL
   Memory:            <not applicable>

   Argument:          width
   Register Offset:   0x68
   Port:              S_AXI_CONTROL
   Memory:            <not applicable>
Kernel: sfm_xrt_top

Definition
----------
   Signature: sfm_xrt_top (int* sm_doneclr, int* sm_cmd_x_len, int* sm_cmd_y_len, int* sm_src_addr, int* sm_dst_addr, int* sm_cmd_scale, int* sm_cmd_offset)

Ports
-----
   Port:          s_axi_control
   Mode:          slave
   Range (bytes): 0x00001000
   Data Width:    32 bits
   Port Type:     addressable

   Port:          M_AXI
   Mode:          master
   Range (bytes): 0xFFFFFFFF
   Data Width:    32 bits
   Port Type:     addressable

--------------------------
Instance:        sfm_xrt_top_1
   Base Address: 0x80010000

   Argument:          sm_doneclr
   Register Offset:   0x40
   Port:              s_axi_control
   Memory:            <not applicable>

   Argument:          sm_cmd_x_len
   Register Offset:   0x44
   Port:              s_axi_control
   Memory:            <not applicable>

   Argument:          sm_cmd_y_len
   Register Offset:   0x48
   Port:              s_axi_control
   Memory:            <not applicable>

   Argument:          sm_src_addr
   Register Offset:   0x4c
   Port:              M_AXI
   Memory:            HPC0 (MEM_DRAM)

   Argument:          sm_dst_addr
   Register Offset:   0x54
   Port:              M_AXI
   Memory:            HPC0 (MEM_DRAM)

   Argument:          sm_cmd_scale
   Register Offset:   0x5c
   Port:              s_axi_control
   Memory:            <not applicable>

   Argument:          sm_cmd_offset
   Register Offset:   0x60
   Port:              s_axi_control
   Memory:            <not applicable>
Kernel: DPUCZDX8G

Definition
----------
   Signature: DPUCZDX8G (int* dpu_doneclr, int* dpu_prof_en, int* dpu_cmd, int* dpu_instr_addr, int* dpu_prof_addr, int* dpu_base0_addr, int* dpu_base1_addr, int* dpu_base2_addr, int* dpu_base3_addr, int* dpu_base4_addr, int* dpu_base5_addr, int* dpu_base6_addr, int* dpu_base7_addr)

Ports
-----
   Port:          S_AXI_CONTROL
   Mode:          slave
   Range (bytes): 0x00001000
   Data Width:    32 bits
   Port Type:     addressable

   Port:          M_AXI_GP0
   Mode:          master
   Range (bytes): 0xFFFFFFFF
   Data Width:    32 bits
   Port Type:     addressable

   Port:          M_AXI_HP0
   Mode:          master
   Range (bytes): 0xFFFFFFFF
   Data Width:    128 bits
   Port Type:     addressable

   Port:          M_AXI_HP2
   Mode:          master
   Range (bytes): 0xFFFFFFFF
   Data Width:    128 bits
   Port Type:     addressable

--------------------------
Instance:        DPUCZDX8G_1
   Base Address: 0x80000000

   Argument:          dpu_doneclr
   Register Offset:   0x40
   Port:              S_AXI_CONTROL
   Memory:            <not applicable>

   Argument:          dpu_prof_en
   Register Offset:   0x44
   Port:              S_AXI_CONTROL
   Memory:            <not applicable>

   Argument:          dpu_cmd
   Register Offset:   0x48
   Port:              S_AXI_CONTROL
   Memory:            <not applicable>

   Argument:          dpu_instr_addr
   Register Offset:   0x50
   Port:              M_AXI_GP0
   Memory:            HPC0 (MEM_DRAM)

   Argument:          dpu_prof_addr
   Register Offset:   0x58
   Port:              M_AXI_GP0
   Memory:            HPC0 (MEM_DRAM)

   Argument:          dpu_base0_addr
   Register Offset:   0x60
   Port:              M_AXI_HP0
   Memory:            HP0 (MEM_DRAM)

   Argument:          dpu_base1_addr
   Register Offset:   0x68
   Port:              M_AXI_HP0
   Memory:            HP0 (MEM_DRAM)

   Argument:          dpu_base2_addr
   Register Offset:   0x70
   Port:              M_AXI_HP0
   Memory:            HP0 (MEM_DRAM)

   Argument:          dpu_base3_addr
   Register Offset:   0x78
   Port:              M_AXI_HP0
   Memory:            HP0 (MEM_DRAM)

   Argument:          dpu_base4_addr
   Register Offset:   0x80
   Port:              M_AXI_HP2
   Memory:            HP1 (MEM_DRAM)

   Argument:          dpu_base5_addr
   Register Offset:   0x88
   Port:              M_AXI_HP2
   Memory:            HP1 (MEM_DRAM)

   Argument:          dpu_base6_addr
   Register Offset:   0x90
   Port:              M_AXI_HP2
   Memory:            HP1 (MEM_DRAM)

   Argument:          dpu_base7_addr
   Register Offset:   0x98
   Port:              M_AXI_HP2
   Memory:            HP1 (MEM_DRAM)
==============================================================================
danielstumpp commented 5 months ago

@AlbertaBeef Another update here... I updated some of the connectivity to reflect the 1 CU default and now the error message does not refer to a cu_index of 2; it is zero now. It seems from reading elsewhere that the following two lines are separate and are the actual cause:

*** Check failure stack trace: ***
/usr/bin/xdputil: line 20:   923 Aborted                 /usr/bin/python3 -m xdputil $*

Any insight into the cause of this? The xdputil source code doesn't indicate anything obvious to me.

quentonh commented 5 months ago

@danielstumpp Quick question....what does show_dpu report? Also, is (and was) Softmax enabled in the DPU IP?

danielstumpp commented 5 months ago

@quentonh Here is the output of show_dpu, same type of behavior:

root@zynqmp-common-20222:~# show_dpu
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0125 23:44:38.437368  1160 xrt_device_handle_imp.cpp:327] Check failed: r == 0 cannot set read range! cu_index 0 cu_base_addr 2147680256 fingerprint 0x0 : Invalid argument [22]
*** Check failure stack trace: ***
Aborted

I didn't actively enable Softmax, and I don't see an option to do that in dpu_conf.vh. Where can I verify that?

danielstumpp commented 5 months ago

Hi @quentonh @AlbertaBeef,

I ported my design to the VCK190 and am encountering the same issue when I have multiple kernels in my dpu.xclbin file. The error message states cu_base_addr 2147549184 which corresponds the hls preprocessing kernel I have implemented.

It looks to me like vart/xdputil is looking at the first kernel in the xclbin and assuming it is the DPU, which fails in the case where the DPU is not the first kernel. How can I reorder the kernels in dpu.xclbin? That seems to be the quickest fix

danielstumpp commented 4 months ago

For those who may be wondering, this was fixed by reverting to VART 2.5. I have not explored further yet, but it seems that VART 3.0 is mistaking any additional kernels as DPUs and tries to configure the memory space as such, leading to the failure.