Closed tamasgp closed 9 months ago
Forgot to mention that all cards have the latest firmware.
Hi, Thank you for bringing this matter to our attention. If you could provide short answers to the following questions, it would help us in debugging this issue: 1- Does this problem happen on every reboot? If so, does it imply that you are not able to use the cads or is there a workaround? 2- Just confirming that this is an on-prem setup and not a cloud deployment? 3- Have you observed the same behaviour with a single card in the chassis? 4- Can provide the out of the following commands: lspci -d 10ee: and xbutil examine? 5- Would you also be able to provide the output of Step 11 of SDK installation, https://xilinx.github.io/video-sdk/v3.0/getting_started_on_prem.html#:~:text=of%20the%20machine.-,Test,-that%20the%20installation ?
Cheers,
Hi, There being no further activity on this thread, I am closing this ticket. Feel free to reopen this or open a new ticket if the need arises. Cheers,
I repeatedly face an issue with my Alveo U30 cards on every server reboot: When sourcing the /opt/xilinx/xrt/setup.sh script, I get back an error message like: { "response": { "name": "load", "requestId": "1", "status": "failed", "data": { "failed": "xclLoadXclBin failed, rc = -5, failed to load /opt/xilinx/xcdr/xclbins/transcode.xclbin to device X" } } }
The device number is usually different between reboots. (I have 6 cards in the server, but the issue cannot be limited to 1 Alveo card). After some cold or hot resets all the devices successfully initialize, and everything work fine until the next server reboot.
During this issue I can see the following messages (truncated, just copied the error part): [ 81.185241] xclmgmt 0000:b2:00.0: xfer_versal.m.27262987 ffff8b5b96495810 xfer_versal_transfer: start writting data_len: 13107697, timeout: 24s [ 81.759175] xclmgmt 0000:b2:00.0: xfer_versal.m.27262987 ffff8b5b96495810 wait_for_status: Timeout, packet header is fffc0201 [ 81.759219] xclmgmt 0000:b2:00.0: xfer_versal.m.27262987 ffff8b5b96495810 xfer_versal_transfer: Data transfer error [ 81.761099] xclmgmt 0000:b2:00.0: mailbox.m.9437195 ffff8b3bad5c0810 mailbox_post_response: posting response for: 7 via HW [ 81.761200] xocl 0000:b2:00.1: icap.u.23068683 ffff8b3babb55c10 __icap_peer_xclbin_download: peer xclbin download err: -5 [ 81.762267] xocl 0000:b2:00.1: icap.u.23068683 ffff8b3babb55c10 icap_download_bitstream_axlf: err: -5 [ 81.762280] xocl 0000:b2:00.1: ffff8b3bb56a70b0 xocl_init_mem: Topology count = 1, data_length = 40 [ 81.762300] xocl 0000:b2:00.1: ffff8b3bb56a70b0 xocl_read_axlf_helper: Failed to download xclbin, err: -5
Can someone tell me how to debug or fix this issue?