CMU-SAFARI / DRAM-Bender

DRAM Bender is the first open source DRAM testing infrastructure that can be used to easily and comprehensively test state-of-the-art HBM2 chips and DDR4 modules of different form factors. Six prototypes are available on different FPGA boards. Described in our preprint: https://arxiv.org/pdf/2211.05838.pdf
MIT License
55 stars 12 forks source link

XDMA error: the kernel is correct but the device is not detected. #1

Closed Ranyang-Zhou closed 1 year ago

Ranyang-Zhou commented 1 year ago

Hi everyone, We met a problem about the XDMA when we followed the tutorial:https://github.com/CMU-SAFARI/DRAM-Bender. The error report is that: the kernel is correct but the device is not detected. The environment we tried is shown as following: Vivado 2020.1, 2020.2, and 2022.2 Ubuntu 20.04 & 18.04(kernel 5.15 and 5.14) XRT for Ubuntu 20.04https://www.xilinx.com/products/boards-and-kits/alveo/u200.html#gettingStarted Xilinx Runtime Deployment Target Platform Development Target Platform XRT for Ubuntu 18.04https://www.xilinx.com/products/boards-and-kits/alveo/u200.html#gettingStarted Xilinx Runtime Deployment Target Platform Development Target Platform XRT Ubuntu 18.04 GitHub Installation https://xilinx.github.io/XRT/master/html/install.html Alveo U200 (2020.1 XDMA https://www.xilinx.com/products/boards-and-kits/alveo/package-files-archive/u200-2020-1.html https://www.omgubuntu.co.uk/2020/08/ubuntu-18-04-5-lts-released-with-linux-kernel-5-4

We also watched the video about DRAM-Bender: https://www.youtube.com/watch?v=FklVEsfdZCI

olgunataberk commented 1 year ago

Hi Ranyang,

It looks like you have tested almost everything to see if the board is working. I am suspecting that the board might be faulty or the physical connection with the board is unreliable given the error message suggests so and assuming that you do not observe the expected functionality after carefully following through the steps in our repository or in Xilinx example designs. Please make sure that the physical connection is reliable.

We use only the XDMA drivers to communicate with our boards. Therefore I cannot really help with the XRT thrust of your testing.

I suppose you encounter the "kernel is correct but the device is not detected" error when you run the load_driver.sh script. Is that correct?

Please go through the following steps (perhaps once more) and let us know what you encounter after each step:

  1. Execute the programming script under the prebuilt directory: ./programFPGA.sh XCU200 C1 RDIMM 1R x4. This assumes that your U200 board has single-rank x4 RDIMMs plugged in.
  2. Reboot the computer.
  3. Reload the XDMA drivers: cd sources/xdma-driver && sudo ./load_driver.sh.
Ranyang-Zhou commented 1 year ago

Hi Ataberk,

I tried to find solution of the driver on Xilinx community but nothing helped. The host PC can recognize the FPGA and show the detailed information, but the driver cannot recognize it.

Followed your advice, I stuck at the first step, it shows "Please assign vivado executable's path to VIVADO_EXEC variable first!"

olgunataberk commented 1 year ago

The VIVADO_EXEC environment variable should point to the Vivado executable for the script you are using to run correctly.

Assuming Vivado is installed under /opt/Xilinx/Vivado/2020.2, you can run the following command prior to running the script:

export VIVADO_EXEC=/opt/Xilinx/Vivado/2020.2/bin/vivado
Ranyang-Zhou commented 1 year ago

Thank you for the help, now it shows:

Trying to program the board with the prebuilt files XCU200/XCU200_C1_RDIMM_1R_x4.bit...

** Vivado v2020.2 (64-bit) SW Build 3064766 on Wed Nov 18 09:12:47 MST 2020 IP Build 3064653 on Wed Nov 18 14:17:31 MST 2020 ** Copyright 1986-2020 Xilinx, Inc. All Rights Reserved.

source ./programFPGA.tcl if { $argc != 2 } { puts "Incorrect number of arguments. Expected BIT_FILE and PROBES_FILE." exit 0 } set BIT_FILE [lindex $argv 0] set PROBES_FILE [lindex $argv 1] open_hw WARNING: 'open_hw' is deprecated, please use 'open_hw_manager' instead. connect_hw_server -url localhost:3121 INFO: [Labtools 27-2285] Connecting to hw_server url TCP:localhost:3121 INFO: [Labtools 27-2222] Launching hw_server... INFO: [Labtools 27-2221] Launch Output:

** Xilinx hw_server v2020.2 ** Build date : Nov 18 2020 at 09:50:49 Copyright 1986-2020 Xilinx, Inc. All Rights Reserved.

INFO: [Labtools 27-3415] Connecting to cs_server url TCP:localhost:3042 INFO: [Labtools 27-3417] Launching cs_server... INFO: [Labtools 27-2221] Launch Output:

**** Xilinx cs_server v2020.2 ** Build date : Nov 03 2020-16:02:56 ** Build number : 2020.2.1604437376 Copyright 2017-2020 Xilinx, Inc. All Rights Reserved.

ERROR: [Labtoolstcl 44-199] No matching targets found on connected servers: localhost Resolution: If needed connect the desired target to a server and use command refresh_hw_server. Then rerun the get_hw_targets command. ERROR: [Common 17-39] 'get_hw_targets' failed due to earlier errors.

while executing

"get_hw_targets " invoked from within "current_hw_target [get_hw_targets ]" (file "./programFPGA.tcl" line 11) Vivado%

olgunataberk commented 1 year ago

Is the FPGA board connected via a programming cable to the machine you are running this script on?

Please refer to page 29+ here: https://www.xilinx.com/support/documents/sw_manuals/xilinx2022_1/ug908-vivado-programming-debugging.pdf. Can Vivado find the hardware target when you click ``Auto Connect'' in Vivado's Hardware Manager? If not, I think it is an issue with the programming cable. Maybe replugging the cable and power cycling the FPGA board could help.

Ranyang-Zhou commented 1 year ago

No, I directly plugged it to the slot, like a GPU board.

olgunataberk commented 1 year ago

Closing this issue with c662fb86bbbbdb846e73e341357a0ef1b20dbaf0's changes to the README file that emphasize the use of a programming cable and installation of Xilinx cable drivers.