Xilinx / ACCL

Alveo Collective Communication Library: MPI-like communication operations for Xilinx Alveo accelerators
https://accl.readthedocs.io/
Apache License 2.0
81 stars 26 forks source link

Data transfer not working above 8 MB #180

Open mar-ven opened 7 months ago

mar-ven commented 7 months ago

I tried to perform an accl.copy call with more than 8 MB in size, and all data above 8388544 bytes are not correctly copied. For reproducibility, I used the Coyote RDMA setup.

quetric commented 7 months ago

Thanks @mar-ven for flagging this. I've looked into this - it is likely caused by the fragmentation mechanism in the DMA Mover HLS kernel, which aims to break up large transfers into 8MB chunks that are compatible with the Xilinx DataMover. A fragmented transfer issues several DMA commands but one single long data stream terminated by a single TLAST. For Coyote, our adapter sets the Coyote DMA CTL field to 1 which means the DMA engine expects TLAST=1 at the end of the data corresponding for each issued command.