Xilinx / XRT

Run Time for AIE and FPGA based platforms
https://xilinx.github.io/XRT
Other
549 stars 464 forks source link

DMA transfer latency #6590

Closed wonkyoc closed 2 years ago

wonkyoc commented 2 years ago

I am measuring DMA transfer time from host to device or vice versa for a project and I noticed a small amount of performance gap after moving to another version. I was using 2021.2 but due to MMIO latency referred in #6053 I upgraded xrt to 2022.1.

I tested the following operations:

...
auto bo0 = xrt::bo(device, vector_size_bytes, krnl.group_id(0));
start = std::chrono::high_resolution_clock::now();
bo0.sync(XCL_BO_SYNC_BO_TO_DEVICE);
end = std::chrono::high_resolution_clock::now();
du = end - start;
std::cout << du.count() << std::endl;
...

Here is the measured numbers:

# 2021.2
avg = 19 us

# 2022.2
avg = 32 us

This latency difference also happens to the actual kernel running and run.wait() (it takes almost double (~= 40us) in 2022.2) Is there any way to reduce this latency in 2022.2? Also, Is the latency number reasonable in 2021.2? I expected the number is less than 10 us which should be slightly similar to the MMIO latency (~5us as I measured). Any helps would be much appreciated!

Environment

houlz0507 commented 2 years ago

Hi @wonkyoc, Could you try "xbutil --legacy dmatest -b [size in KB]" on your machine for both 2021.2 and 2022.1? This will be helpful to narrow down the issue.

wonkyoc commented 2 years ago

This is confirmed due to the misconfiguration. Both version shows the similar number. By the way, you probably know that the legacy function is not supported in 2022.1 so I just used xbutil validate

# 2022.1
Test 4 [0000:b4:00.1]     : dma 
    Details               : Buffer size - '16 MB'
                            Host -> PCIe -> FPGA write bandwidth = 5644.2 MB/s
                            Host <- PCIe <- FPGA read bandwidth = 6263.1 MB/s
    Test Status           : [PASSED]

# 2021.2
INFO: Found total 1 card(s), 1 are usable
INFO: DMA test on [0]: xilinx_u25_gen3x8_xdma_base_1
Buffer Size: 16 MB
Reporting from mem_topology:
Data Validity & DMA Test on DDR[0]
Host -> PCIe -> FPGA write bandwidth = 5655.1 MB/s
Host <- PCIe <- FPGA read bandwidth = 6416.5 MB/s
Data Validity & DMA Test on DDR[1]
Host -> PCIe -> FPGA write bandwidth = 4560.2 MB/s
Host <- PCIe <- FPGA read bandwidth = 4489.2 MB/s
INFO: xbutil dmatest succeeded.
houlz0507 commented 2 years ago

Thank for trying this.