Open littlewu2508 opened 9 months ago
@littlewu2508 Internal ticket has been created to assist with your question. Thanks!
Hi @littlewu2508 I think currently, achieving p2p between Xilinx and AMD GPU is not directly supported. However, one potential work-around is to create a p2p buffer on the FPGA then map it to the host memory space following this documentation. Then, register the base pointer of the mapped p2p buffer on the GPU device with something like hipHostRegister(). Afterwards, we can use hipHostGetDevicePointer to obtain a device pointer through which one may potentially interact with the FPGA directly.
Now, I haven't tested this myself since we currently don't have a test set up specifically for this configuration. However, this should work in theory. Please let me know if this sounds reasonable and if it works on your end.
Thanks!
Hi @littlewu2508 I think currently, achieving p2p between Xilinx and AMD GPU is not directly supported. However, one potential work-around is to create a p2p buffer on the FPGA then map it to the host memory space following this documentation. Then, register the base pointer of the mapped p2p buffer on the GPU device with something like hipHostRegister(). Afterwards, we can use hipHostGetDevicePointer to obtain a device pointer through which one may potentially interact with the FPGA directly.
Now, I haven't tested this myself since we currently don't have a test set up specifically for this configuration. However, this should work in theory. Please let me know if this sounds reasonable and if it works on your end.
Thanks!
Thank you very much for the idea! I will try it out when I have time.
Also, will data go through host memory in this workaround? The motivation of p2p DMA copy is to increase the bandwidth and lower latency of data transfer.
Also, will data go through host memory in this workaround? The motivation of p2p DMA copy is to increase the bandwidth and lower latency of data transfer.
@littlewu2508. If everything works out, the GPU should be directly accessing the FPGA memory without using the host buffer. The idea is that hipHostRegister() will coordinate with the OS to translate and pin the target host memory, which in our case would be the the virtually mapped FPGA memory. This involves a virtual-to-physical memory translation in the OS, after which the GPU is given a device pointer that should correspond to the physical memory address (which is on the FPGA). It can then directly access the FPGA memory with the help of the PCIe controller. The main troublesome part is the address translation in the OS, because that involves the GPU and the FPGA drivers, as well as the PCIe.
Hi @littlewu2508, I will be closing this issue now due to inactivity. Please feel free to reopen for more follow-ups. Thanks!
Problem Description
I'm currently interested in p2p data transfer from FPGA (Xilinx Alveo U50) to an AMDGPU. There are already implementation for FPGA-Nvidia GPU at https://github.com/RC4ML/FpgaNIC, using https://github.com/NVIDIA/gdrcopy, and in the past there are researches [1,2] achieving that with DirectGMA. But DirectGMA is now deprecated along with the proprietary fglrx driver. I wonder with the open source amdgpu driver, is there any similar methods?
[1] http://dx.doi.org/10.1088/1748-0221/11/02/P02007 [2] http://dx.doi.org/10.1088/1748-0221/12/03/C03015
I read some source code about dma p2p copy in https://github.com/ROCm/ROCR-Runtime/blob/master/src/core/runtime/ and https://github.com/ROCm/ROCT-Thunk-Interface/tree/master/tests/kfdtest/, it seems that all the userspace dma copy are utilizing the hsa driver. But as I know currently there's no hsa support in Xilinx Alveo cards (maybe there's on-going work), so I wonder if it's possible for dma p2p between FPGA and AMDGPU
I also raised this question in https://github.com/openucx/ucx/issues/9598 and found Xilinx Alveo cards support PCIe dma p2p, via opencl on XRT. Does that mean I can use opencl to achieve p2p between FPGA and AMDGPU? However as I understand rocm-opencl-runtime is also based on hsa.
Operating System
Debian 12
CPU
AMD EPYC 7702 64-Core Processor
GPU
AMD Instinct MI100
ROCm Version
ROCm 5.7.1
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response