Xilinx / ACCL

Alveo Collective Communication Library: MPI-like communication operations for Xilinx Alveo accelerators
https://accl.readthedocs.io/
Apache License 2.0
81 stars 26 forks source link

FPGABuffer class doesn't retrun physical hardware address after calling .bo() function #117

Closed zhenhaohe closed 1 year ago

zhenhaohe commented 1 year ago

The code below creates two fpga buffers and it is synced to device. However, the address got from the tx_buf_network->bo() and the rx_buf_network->bo() doesn't represent the physical FPGA memory address. So the network kernel can not write to memory, or can only write a very small amount of data to memory.

Buffer tx_buf_network = new FPGABuffer(3210241024, dataType::int8, device, networkmem); Buffer rx_buf_network = new FPGABuffer(3210241024, dataType::int8,device, networkmem); tx_buf_network->sync_to_device(); rx_buf_network->sync_to_device(); network_krnl(localFPGAIP, uint(rank), localFPGAIP, tx_buf_network->bo(), rx_buf_network->bo());

After changing the buffer instantiation using the original xrt api, it works fine. The code is attached below:

auto tx_buf_network = xrt::bo (device, 810241024sizeof(int8_t), networkmem); tx_buf_network.sync(XCL_BO_SYNC_BO_TO_DEVICE); auto rx_buf_network = xrt::bo (device, 810241024sizeof(int8_t), networkmem); rx_buf_network.sync(XCL_BO_SYNC_BO_TO_DEVICE); network_krnl(localFPGAIP, uint(rank), localFPGAIP, tx_buf_network, rx_buf_network);

TristanLaan commented 1 year ago

Hi Zhenhao, note that the Buffer::bo function returns a pointer to the bo object, not the bo object itself.

Could you try replacing

network_krnl(localFPGAIP, uint(rank), localFPGAIP, tx_buf_network->bo(), rx_buf_network->bo());

with

network_krnl(localFPGAIP, uint(rank), localFPGAIP, *(tx_buf_network->bo()), *(rx_buf_network->bo()));

and see if that works?