Closed Mellich closed 1 year ago
Following user kernel is used to schedule sends and receives from PL:
#include "accl_hls.h" void send_recv(const float *read_buffer,float *write_buffer, ap_uint<32> size, ap_uint<32> num_iterations, ap_uint<32> neighbor_rank, ap_uint<32> communicator_addr, ap_uint<32> datapath_cfg, STREAM<command_word> &cmd, STREAM<command_word> &sts) { accl_hls::ACCLCommand accl_cmd(cmd, sts, communicator_addr, datapath_cfg,0,0); for (int i = 0; i < num_iterations; i++) { accl_cmd.send(size, 0, neighbor_rank, (ap_uint<64>)read_buffer); accl_cmd.recv(size, 0, neighbor_rank, (ap_uint<64>)write_buffer); } }
The user kernel is linked with the ACCL cclo and plugin kernels of the latest dev branch like this: https://github.com/XilinxDublinLabs/HPCBenchmarks/blob/accl/b_eff/settings/settings.link.xilinx.accl_pl.u55c.hbm.profile.ini
The execution of the design gets stuck when executing the send_recv kernel. Profiling data shows, that the commands of the user kernel do not get passed to the client_arbiterand cclo:
send_recv
client_arbiter
cclo
Accelerator Monitor Counters (hex values are cycle count) Compute Unit Ends Starts Max Parallel Itr Execution Memory Stall Pipe Stall Stream Stall Min Exec Max Exec ccl_offload_0 0 0 0 0x0 0x0 0x0 0x0 0xffffffffffffffff 0x0 hostctrl_0 4 4 1 0x6ab 0x0 0x0 0x0 0xc6 0x45d networklayer_0 0 1 1 0x27aad2d70 0x0 0x0 0x0 0xffffffffffffffff 0x0 sendrecv 0 1 1 0x220eb9f1e 0x0 0x0 0x0 0xffffffffffffffff 0x0 cmac_0 0 0 0 0x0 0x0 0x0 0x0 0xffffffffffffffff 0x0 AXI Stream Monitor Counters Stream Master Stream Slave Num Trans. Data kBytes Busy Cycles Stall Cycles Starve Cycles cmac_0/M_AXIS networklayer_0/S_AXIS_eth2nl 48 0.832 118 0 14 networklayer_0/M_AXIS_nl2eth cmac_0/S_AXIS 512 4.096 512 0 0 networklayer_0/M_AXIS_nl2sk PIPE 0 0.000 0 0 0 ccl_offload_0/m_axis_eth_tx_data PIPE 0 0.000 0 0 0 sendrecv/cmd client_arbiter/cmd_clients_1 0 0.000 0 0 0 ccl_offload_0/m_axis_call_ack client_arbiter/ack_cclo 0 0.016 9142780514 0 9142780510 client_arbiter/ack_clients_0 hostctrl_0/sts 0 0.016 9142809039 0 9142809035 client_arbiter/ack_clients_1 sendrecv/sts 0 0.000 0 0 0 client_arbiter/cmd_cclo ccl_offload_0/s_axis_call_req 4 0.240 76 16 0 hostctrl_0/cmd client_arbiter/cmd_clients_0 4 0.240 94 34 0
Following user kernel is used to schedule sends and receives from PL:
The user kernel is linked with the ACCL cclo and plugin kernels of the latest dev branch like this: https://github.com/XilinxDublinLabs/HPCBenchmarks/blob/accl/b_eff/settings/settings.link.xilinx.accl_pl.u55c.hbm.profile.ini
The execution of the design gets stuck when executing the
send_recv
kernel. Profiling data shows, that the commands of the user kernel do not get passed to theclient_arbiter
andcclo
: