Xilinx / XRT

Run Time for AIE and FPGA based platforms
https://xilinx.github.io/XRT
Other
545 stars 458 forks source link

[XRT] ERROR: See dmesg log for details. err = -22 #7892

Open LiShuang-codes opened 7 months ago

LiShuang-codes commented 7 months ago

Hello! I run the XRT/test/xrt/13_add_one example. It runs smooothly in hw type. BUT not in sw_emu type. Idont know why. I use v++ to compile cl kernel to xo file and link it to xclbin file in sw_emu type. The source code is in XRT/test/xrt/13_add_one, the host compile command is

g++ -g -std=c++17 -I$XILINX_XRT/include -L$XILINX_XRT/lib -o fp.exe main.cpp -lxrt_coreutil -luuid -pthread

the kernel compile command is

v++ -c -g -t sw_emu --platform xilinx_u250_gen3x16_xdma_4_1_202210_1 -k addone -o addone.xo kernel.cl
v++ -l -g -t sw_emu --platform xilinx_u250_gen3x16_xdma_4_1_202210_1 -k addone -o addone.xclbin --input_files addone.xo

Finally, I run this command to start program

./fp.exe -k addone.xclbin

Output:

XRT build version: 2.17.0
Build hash: 50aca919b5e72eea48f733fde889e1362c5f78aa
Build date: 2024-01-17 11:20:03
Git branch: master
PID: 261648
UID: 1000
[Thu Jan 18 01:31:05 2024 GMT]
HOST: ubuntu
EXE: /home/user/XRT/tests/xrt/13_add_one/fp.exe
[XRT] ERROR: See dmesg log for details. err = -22
Exception: failed to load xclbin: Invalid argument
FAILED TEST

Dmesg display:

[51999.280986] xocl 0000:18:00.1:  ffff9e9a898040d0 _xocl_drvinst_open: OPEN 1
[51999.281008] xocl 0000:18:00.1:  ffff9e9a898040d0 xocl_create_client: created KDS client for pid(261641), ret: 0
[51999.286384] xocl 0000:18:00.1: xmc.u.18874368 ffff9e83e4b2c410 xmc_read_from_peer: reading from peer
[51999.286408] xocl 0000:18:00.1: mailbox.u.9437184 ffff9e83e4b2a010 _mailbox_request: sending request: 10 via HW
[51999.290353] xclmgmt 0000:18:00.0: mailbox.m.9437184 ffff9e8406342010 process_request: received request from peer: 10, passed on
[51999.290374] xclmgmt 0000:18:00.0: xclmgmt_read_subdev_req: req kind 0
[51999.290453] xclmgmt 0000:18:00.0: mailbox.m.9437184 ffff9e8406342010 mailbox_post_response: posting response for: 10 via HW
[51999.410959] xocl 0000:18:00.1: icap.u.22020096 ffff9e83e4b28810 icap_read_from_peer: reading from peer
[51999.410983] xocl 0000:18:00.1: mailbox.u.9437184 ffff9e83e4b2a010 _mailbox_request: sending request: 10 via HW
[51999.411318] xclmgmt 0000:18:00.0: mailbox.m.9437184 ffff9e8406342010 process_request: received request from peer: 10, passed on
[51999.411337] xclmgmt 0000:18:00.0: xclmgmt_read_subdev_req: req kind 1
[51999.411375] xclmgmt 0000:18:00.0: clock_wizard.m.27262976 ffff9e830cc23c10 clock_wiz_get_freq_by_id: freq = 300
[51999.411390] xclmgmt 0000:18:00.0: clock_wizard.m.27262976 ffff9e830cc23c10 clock_wiz_get_freq_by_id: freq = 500
[51999.411398] xclmgmt 0000:18:00.0: clock_wizard.m.27262976 ffff9e830cc23c10 clock_wiz_get_freq_by_id: freq = 0
[51999.413405] xclmgmt 0000:18:00.0: clock_freq_counter.m.28311552 ffff9e830cc22010 clock_counter_get_freq: khz: 300000
[51999.415414] xclmgmt 0000:18:00.0: clock_freq_counter.m.28311552 ffff9e830cc22010 clock_counter_get_freq: khz: 500000
[51999.415423] xclmgmt 0000:18:00.0: clock_freq_counter.m.28311552 ffff9e830cc22010 clock_counter_get_freq: khz: 0
[51999.415436] xclmgmt 0000:18:00.0: mailbox.m.9437184 ffff9e8406342010 mailbox_post_response: posting response for: 10 via HW
[51999.416187] xocl 0000:18:00.1: icap.u.22020096 ffff9e83e4b28810 icap_cached_ocl_frequency: no cached data for 3
[51999.474101] xocl 0000:18:00.1:  ffff9e9a898040d0 xocl_destroy_client: client exits pid(261641)
[51999.474120] xocl 0000:18:00.1:  ffff9e9a898040d0 xocl_drvinst_close: CLOSE 2
[51999.474129] xocl 0000:18:00.1:  ffff9e9a898040d0 xocl_drvinst_close: NOTIFY 00000000ec6c7e4f
[52003.469753] xocl 0000:18:00.1:  ffff9e9a898040d0 _xocl_drvinst_open: OPEN 1
[52003.469772] xocl 0000:18:00.1:  ffff9e9a898040d0 xocl_create_client: created KDS client for pid(261648), ret: 0
[52003.475467] xocl 0000:18:00.1: xmc.u.18874368 ffff9e83e4b2c410 xmc_read_from_peer: reading from peer
[52003.475487] xocl 0000:18:00.1: mailbox.u.9437184 ffff9e83e4b2a010 _mailbox_request: sending request: 10 via HW
[52003.553928] xclmgmt 0000:18:00.0: mailbox.m.9437184 ffff9e8406342010 process_request: received request from peer: 10, passed on
[52003.553947] xclmgmt 0000:18:00.0: xclmgmt_read_subdev_req: req kind 0
[52003.554020] xclmgmt 0000:18:00.0: mailbox.m.9437184 ffff9e8406342010 mailbox_post_response: posting response for: 10 via HW
[52003.675169] xocl 0000:18:00.1: icap.u.22020096 ffff9e83e4b28810 icap_read_from_peer: reading from peer
[52003.675187] xocl 0000:18:00.1: mailbox.u.9437184 ffff9e83e4b2a010 _mailbox_request: sending request: 10 via HW
[52003.675572] xclmgmt 0000:18:00.0: mailbox.m.9437184 ffff9e8406342010 process_request: received request from peer: 10, passed on
[52003.675591] xclmgmt 0000:18:00.0: xclmgmt_read_subdev_req: req kind 1
[52003.675626] xclmgmt 0000:18:00.0: clock_wizard.m.27262976 ffff9e830cc23c10 clock_wiz_get_freq_by_id: freq = 300
[52003.675642] xclmgmt 0000:18:00.0: clock_wizard.m.27262976 ffff9e830cc23c10 clock_wiz_get_freq_by_id: freq = 500
[52003.675650] xclmgmt 0000:18:00.0: clock_wizard.m.27262976 ffff9e830cc23c10 clock_wiz_get_freq_by_id: freq = 0
[52003.677657] xclmgmt 0000:18:00.0: clock_freq_counter.m.28311552 ffff9e830cc22010 clock_counter_get_freq: khz: 300000
[52003.679666] xclmgmt 0000:18:00.0: clock_freq_counter.m.28311552 ffff9e830cc22010 clock_counter_get_freq: khz: 500000
[52003.679674] xclmgmt 0000:18:00.0: clock_freq_counter.m.28311552 ffff9e830cc22010 clock_counter_get_freq: khz: 0
[52003.679687] xclmgmt 0000:18:00.0: mailbox.m.9437184 ffff9e8406342010 mailbox_post_response: posting response for: 10 via HW
[52003.680642] xocl 0000:18:00.1: icap.u.22020096 ffff9e83e4b28810 icap_cached_ocl_frequency: no cached data for 3
[52003.687591] xocl 0000:18:00.1:  ffff9e9a898040d0 xocl_resolver: Loading xclbin 5449f75b-0efc-ec71-9673-5fac39ccf319 to slot 0
[52003.687605] [drm] Finding MEM_TOPOLOGY section header
[52003.687608] [drm] Section MEM_TOPOLOGY details:
[52003.687611] [drm]   offset = 0x25598
[52003.687613] [drm]   size = 0x170
[52003.687689] xocl 0000:18:00.1:  ffff9e9a898040d0 xocl_read_axlf_helper: check interface uuid
[52003.687725] xocl 0000:18:00.1: icap.u.22020096 ffff9e83e4b28810 icap_download_bitstream_axlf: check interface uuid
[52003.687741] xocl 0000:18:00.1: icap.u.22020096 ffff9e83e4b28810 icap_download_bitstream_axlf: CLOCK_FREQ_TOPOLOGY doesn't exist. XRT is not configuring any clocks
[52003.687752] xocl 0000:18:00.1: icap.u.22020096 ffff9e83e4b28810 __icap_download_bitstream_axlf: incoming xclbin: 5449f75b-0efc-ec71-9673-5fac39ccf319
               on device xclbin: 00000000-0000-0000-0000-000000000000
[52003.687999] xocl 0000:18:00.1:  ffff9e9a898040d0 xocl_kds_unregister_cus: Got ERT XGQ command version 1.0
[52003.690248] xocl 0000:18:00.1:  ffff9e9a898040d0 xocl_kds_xgq_cfg_start: Config start completed, num_cus(0), num_scus(0)
[52003.873256] xocl 0000:18:00.1:  ffff9e9a898040d0 xocl_kds_xgq_cfg_end: Config end completed
[52003.873273] xocl 0000:18:00.1:  ffff9e9a898040d0 __xocl_subdev_destroy: Destroy subdev cu, cdev 0000000000000000
[52003.873316] xocl 0000:18:00.1: cu.u.44040192 ffff9e9ae3503010 cu_remove: Unregister CU interrupt id 1
[52003.873391] xocl 0000:18:00.1:  ffff9e9a898040d0 xrt_cu_intr_thread: CU[0] thread end, bad state 0
[52003.873553] xocl 0000:18:00.1:  ffff9e9a898040d0 __xocl_subdev_destroy: Destroy subdev mig, cdev 0000000000000000
[52003.873671] xocl 0000:18:00.1:  ffff9e9a898040d0 __xocl_subdev_destroy: Destroy subdev xvc_pub, cdev 00000000a77ded9e
[52003.874061] xocl 0000:18:00.1: icap.u.22020096 ffff9e83e4b28810 icap_multislot_version_from_peer: reading from peer
[52003.874077] xocl 0000:18:00.1: mailbox.u.9437184 ffff9e83e4b2a010 _mailbox_request: sending request: 10 via HW
[52003.874353] xclmgmt 0000:18:00.0: mailbox.m.9437184 ffff9e8406342010 process_request: received request from peer: 10, passed on
[52003.874372] xclmgmt 0000:18:00.0: xclmgmt_read_subdev_req: req kind 12
[52003.874391] xclmgmt 0000:18:00.0: xclmgmt_multislot_version: Multislot Version : 80
[52003.874397] xclmgmt 0000:18:00.0: mailbox.m.9437184 ffff9e8406342010 mailbox_post_response: posting response for: 10 via HW
[52003.874604] xocl 0000:18:00.1: mailbox.u.9437184 ffff9e83e4b2a010 _mailbox_request: sending request: 17 via HW
[52003.874769] xclmgmt 0000:18:00.0: mailbox.m.9437184 ffff9e8406342010 process_request: received request from peer: 17, passed on
[52003.874899] xclmgmt 0000:18:00.0:  ffff9e9a898000d0 xocl_xclbin_download: slot_id = 0
[52003.874928] xclmgmt 0000:18:00.0: icap.m.22020096 ffff9e8406347410 icap_download_bitstream_axlf: check interface uuid
[52003.874951] xclmgmt 0000:18:00.0: icap.m.22020096 ffff9e8406347410 icap_download_bitstream_axlf: CLOCK_FREQ_TOPOLOGY doesn't exist. XRT is not configuring any clocks
[52003.874967] xclmgmt 0000:18:00.0: icap.m.22020096 ffff9e8406347410 __icap_download_bitstream_axlf: incoming xclbin: 5449f75b-0efc-ec71-9673-5fac39ccf319
               on device xclbin: 00000000-0000-0000-0000-000000000000
[52003.981647] xclmgmt 0000:18:00.0:  ffff9e9a898000d0 raptor_cmc_access: Release CMC succeeded.
[52003.981681] xclmgmt 0000:18:00.0:  ffff9e9a898000d0 __xocl_subdev_destroy: Destroy subdev clock_freq_counter, cdev 0000000000000000
[52003.981711] xclmgmt 0000:18:00.0: clock_freq_counter.m.28311552 ffff9e830cc22010 clock_counter_remove: successfully removed Clock Counter subdev
[52003.981816] xclmgmt 0000:18:00.0:  ffff9e9a898000d0 __xocl_subdev_destroy: Destroy subdev clock_wizard, cdev 0000000000000000
[52003.981835] xclmgmt 0000:18:00.0: clock_wizard.m.27262976 ffff9e830cc23c10 clock_wiz_remove: successfully removed Clock Wizard subdev
[52003.981905] xclmgmt 0000:18:00.0:  ffff9e9a898000d0 __xocl_subdev_destroy: Destroy subdev mig, cdev 0000000000000000
[52003.982071] xclmgmt 0000:18:00.0:  ffff9e9a898000d0 __xocl_subdev_destroy: Destroy subdev iores3, cdev 0000000000000000
[52003.982165] xclmgmt 0000:18:00.0: icap.m.22020096 ffff9e8406347410 icap_cache_bitstream_axlf_section: found kind 6(MEM_TOPOLOGY)
[52003.982182] xclmgmt 0000:18:00.0: icap.m.22020096 ffff9e8406347410 icap_cache_bitstream_axlf_section: found kind 8(IP_LAYOUT)
[52003.982192] xclmgmt 0000:18:00.0: icap.m.22020096 ffff9e8406347410 icap_refresh_clock_freq: ret: 0
[52003.982199] xclmgmt 0000:18:00.0: icap.m.22020096 ffff9e8406347410 icap_freeze_axi_gate: freezing CL AXI gate
[52003.982211] xclmgmt 0000:18:00.0:  ffff9e9a898000d0 axigate_freeze: freeze gate ep_pr_isolate_ulp_00 level 2
[52003.982219] xclmgmt 0000:18:00.0: icap.m.22020096 ffff9e8406347410 icap_download_hw: found second stage bitstream of size 0x25218
[52003.982228] xclmgmt 0000:18:00.0: icap.m.22020096 ffff9e8406347410 icap_download_hw: Dowload bitstream failed
[52003.982235] xclmgmt 0000:18:00.0: icap.m.22020096 ffff9e8406347410 icap_download_hw: icap_download_hw, err = -22
[52003.982244] xclmgmt 0000:18:00.0: icap.m.22020096 ffff9e8406347410 icap_free_axi_gate: freeing CL AXI gate
[52003.982256] xclmgmt 0000:18:00.0:  ffff9e9a898000d0 axigate_free: free gate ep_pr_isolate_ulp_00 level 2
[52003.982264] xclmgmt 0000:18:00.0: icap.m.22020096 ffff9e8406347410 __icap_xclbin_download: ret: -22
[52003.982274] xclmgmt 0000:18:00.0: icap.m.22020096 ffff9e8406347410 icap_download_bitstream_axlf: err: -22
[52003.982307] xclmgmt 0000:18:00.0: mailbox.m.9437184 ffff9e8406342010 mailbox_post_response: posting response for: 17 via HW
[52003.982540] xocl 0000:18:00.1: icap.u.22020096 ffff9e83e4b28810 __icap_peer_xclbin_download: peer xclbin download err: -22
[52003.982571] xocl 0000:18:00.1: icap.u.22020096 ffff9e83e4b28810 icap_cache_bitstream_axlf_section: found kind 6(MEM_TOPOLOGY)
[52003.982589] xocl 0000:18:00.1: icap.u.22020096 ffff9e83e4b28810 icap_cache_bitstream_axlf_section: found kind 26(ASK_GROUP_TOPOLOGY)
[52003.982602] xocl 0000:18:00.1: icap.u.22020096 ffff9e83e4b28810 icap_download_bitstream_axlf: err: -22
[52003.982618] xocl 0000:18:00.1:  ffff9e9a898040d0 xocl_read_axlf_helper: Failed to download xclbin, err: -22
[52004.057665] xocl 0000:18:00.1:  ffff9e9a898040d0 xocl_destroy_client: client exits pid(261648)
[52004.057683] xocl 0000:18:00.1:  ffff9e9a898040d0 xocl_drvinst_close: CLOSE 2
[52004.057692] xocl 0000:18:00.1:  ffff9e9a898040d0 xocl_drvinst_close: NOTIFY 00000000ec6c7e4f

These information maybe help.

$xbutil examine
System Configuration
  OS Name              : Linux
  Release              : 5.15.0-43-generic
  Version              : #46-Ubuntu SMP Tue Jul 12 10:30:17 UTC 2022
  Machine              : x86_64
  CPU Cores            : 56
  Memory               : 193117 MB
  Distribution         : Ubuntu 22.04.3 LTS
  GLIBC                : 2.35
  Model                : S2600WFT
  BIOS vendor          : Intel Corporation
  BIOS version         : SE5C620.86B.00.01.0014.070920180847

XRT
  Version              : 2.17.0
  Branch               : master
  Hash                 : 50aca919b5e72eea48f733fde889e1362c5f78aa
  Hash Date            : 2024-01-17 11:20:03
  XOCL                 : 2.17.0, 50aca919b5e72eea48f733fde889e1362c5f78aa
  XCLMGMT              : 2.17.0, 50aca919b5e72eea48f733fde889e1362c5f78aa

Devices present
BDF             :  Shell                               Logic UUID                            Device ID       Device Ready*
----------------------------------------------------------------------------------------------------------------------------
[0000:18:00.1]  :  xilinx_u250_gen3x16_xdma_shell_4_1  12C8FAFB-0632-499D-B1C0-C6676271B8A6  user(inst=128)  Yes

* Devices that are not ready will have reduced functionality when using XRT tools

$ sudo /opt/xilinx/xrt/bin/xbmgmt examine --report platform --device 0000:18:00.0
----------------------------------------------------
[0000:18:00.0] : xilinx_u250_gen3x16_xdma_shell_4_1
----------------------------------------------------
Flash properties
  Type                 : spi
  Serial Number        : 21330402S02V

Device properties
  Type                 : u250
  Name                 : ALVEO U250 PQ
  Config Mode          : 0x7
  Max Power            : 225W

Flashable partitions running on FPGA
  Platform             : xilinx_u250_gen3x16_base_4
  SC Version           : 4.6.21
  Logic UUID           : F8DAC62E-49D9-B0AA-E9FC-6F260D9D0DFB
  Interface UUID       : 807A580E-5F50-7D48-484D-26C2217AA787

  Platform             : xilinx_u250_gen3x16_xdma_shell_4_1
  Logic UUID           : 12C8FAFB-0632-499D-B1C0-C6676271B8A6
  Interface UUID       : A8880C23-11A2-A42B-5FB7-6899D385ACEE

Flashable partitions installed in system
  Platform             : xilinx_u250_gen3x16_base_4
  SC Version           : 4.6.21
  Logic UUID           : F8DAC62E-49D9-B0AA-E9FC-6F260D9D0DFB

  Platform             : xilinx_u250_gen3x16_xdma_shell_4_1
  Logic UUID           : 12C8FAFB-0632-499D-B1C0-C6676271B8A6
  Interface UUID       : 807A580E-5F50-7D48-484D-26C2217AA787

  Mac Address          : 00:0A:35:06:92:36
                       : 00:0A:35:06:92:37
jiahanxie353 commented 1 month ago

Hi @LiShuang-codes ! I got exactly the same result here. And when I xbutil validate, the validation all passed, so it's very weird.

Any new findings?

jiahanxie353 commented 1 month ago

After taking a closer look, XRT couldn't read my .xclbin because it fails at xrt_xclbin_parse_header, where the even magic byte is defined here.

May I ask why is it hardcoded as 0xf? Is there any relevant spec/documentation I can refer to? I'm pretty new to Xilinx and I'm having a difficult time to find adequate resource for fixing this. But I have been referring to this UltraScale doc, where it mentions:

The FPGA reads the bitstream header to determine the flash read option selected for reading the configuration data.

Not sure if they refer to the same "bitstream header"..

Any insight would be immensely appreciated! Apologize in advance if I'm tagging the wrong person @stsoe , I just came across several issues/PRs where you reviewed them really versedly so I thought you might have some clue to point us :)

stsoe commented 1 month ago

Hi @jiahanxie353 . I am surprised that documentation isn't more clear about what is required to run in emulation mode. You must set XCL_EMULATION_MODE=sw_emu | hw_emu in the shell where you run the test. In your case I would expect this to work

env XCL_EMULATION_MODE=sw_emu ./fp.exe -k addone.xclbin

Not obvious, I agree, and poor error message.

jiahanxie353 commented 1 month ago

Thanks so much for your swift response @stsoe ! That solves the issue! I did set the environment variable but Python somehow didn't update its os.env.. now I'm forcing to update it. Thanks again!