Xilinx / dma_ip_drivers

Xilinx QDMA IP Drivers
https://xilinx.github.io/dma_ip_drivers/
578 stars 420 forks source link

qdma-perf stops c2h from working #200

Open busfault opened 1 year ago

busfault commented 1 year ago

I've ported the qdma performance reference design to the ZCU106 board (PCIe Gen 3 x4). I do the following:

  1. flash the FPGA via JTAG.
  2. Reboot linux so that the card will enumerate on the pci bus
  3. ~# modprobe qdma-pf
  4. ~# modprobe qdma-vf
  5. ~# echo 3 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
  6. ~# echo 128 >/sys/bus/pci/devices/0000:01:00.0/qdma/qmax 1 ~# echo 128 >/sys/bus/pci/devices/0000:01:00.1/qdma/qmax
  7. ~# echo 128 >/sys/bus/pci/devices/0000:01:00.2/qdma/qmax
  8. ~# echo 128 >/sys/bus/pci/devices/0000:01:00.3/qdma/qmax
  9. Next run the c2h performance scripts:
    • ~# dma-perf -c st-c2h-pfetch1/st_1_1_pfetch-cmptsz1/c2h_st_1_1_pfetch-cmptsz1_320
    • ~# dma-perf -c st-c2h-pfetch1/st_1_2_pfetch-cmptsz1/c2h_st_1_2_pfetch-cmptsz1_320
    • ~# dma-perf -c st-c2h-pfetch1/st_1_4_pfetch-cmptsz1/c2h_st_1_4_pfetch-cmptsz1_320
    • ~# dma-perf -c st-c2h-pfetch1/st_1_8_pfetch-cmptsz1/c2h_st_1_8_pfetch-cmptsz1_320
    • ~# dma-perf -c st-c2h-pfetch1/st_1_1_pfetch-cmptsz1/c2h_st_1_1_pfetch-cmptsz1_384 <== This one is broken now!

The outputs I get are as follows: 1 Queue, PktSz 320 : READ: total pps = 142 BW = 45.738667 KB/sec <=??? 2 Queue, PktSz 320 : READ: total pps = 8276977 BW = 2.648633 GB/sec  4 Queue, PktSz 320 : READ: total pps = 9563434 BW = 3.060299 GB/sec 8 Queue, PktSz 320 : READ: total pps = 9479665 BW = 3.033493 GB/sec 1 Queue, PktSz 384 : No IOs happened

The last report of "No IOs happened" then happens thereafter with dma-perf

To get the system back functional I have tried:

  1. rmmod and modprobe to remove and reinstall the driver (still broken)
  2. Reflashing the FPGA (this really broke the system)

I had to fully powercycle the Linux machine in order to get it back to a state to be able to run testing again.

busfault commented 1 year ago

I was able to continue running h2c and bi but c2h is seemingly broken and c2h transactions in bi are also broken at that point.

busfault commented 1 year ago

Further investigation leads to 1 Queue with Packet Size of 384 to break this consistently without any other sizes run.