Xilinx / ACCL

Alveo Collective Communication Library: MPI-like communication operations for Xilinx Alveo accelerators
https://accl.readthedocs.io/
Apache License 2.0
81 stars 26 forks source link

Command Scheduling from PL is stuck #116

Closed Mellich closed 1 year ago

Mellich commented 1 year ago

Following user kernel is used to schedule sends and receives from PL:

#include "accl_hls.h"

void send_recv(const float *read_buffer,float *write_buffer,  ap_uint<32> size, ap_uint<32> num_iterations, 
                ap_uint<32> neighbor_rank, ap_uint<32> communicator_addr, ap_uint<32> datapath_cfg,
                STREAM<command_word> &cmd, STREAM<command_word> &sts) {
    accl_hls::ACCLCommand accl_cmd(cmd, sts, communicator_addr, datapath_cfg,0,0);
    for (int i = 0; i < num_iterations; i++) {
        accl_cmd.send(size, 0, neighbor_rank, (ap_uint<64>)read_buffer);
        accl_cmd.recv(size, 0, neighbor_rank, (ap_uint<64>)write_buffer);
    }
}

The user kernel is linked with the ACCL cclo and plugin kernels of the latest dev branch like this: https://github.com/XilinxDublinLabs/HPCBenchmarks/blob/accl/b_eff/settings/settings.link.xilinx.accl_pl.u55c.hbm.profile.ini

The execution of the design gets stuck when executing the send_recv kernel. Profiling data shows, that the commands of the user kernel do not get passed to the client_arbiterand cclo:

Accelerator Monitor Counters (hex values are cycle count)
  Compute Unit       Ends      Starts    Max Parallel Itr  Execution         Memory Stall      Pipe Stall        Stream Stall      Min Exec          Max Exec        
  ccl_offload_0      0         0         0                 0x0               0x0               0x0               0x0               0xffffffffffffffff  0x0             
  hostctrl_0         4         4         1                 0x6ab             0x0               0x0               0x0               0xc6              0x45d           
  networklayer_0     0         1         1                 0x27aad2d70       0x0               0x0               0x0               0xffffffffffffffff  0x0             
  sendrecv           0         1         1                 0x220eb9f1e       0x0               0x0               0x0               0xffffffffffffffff  0x0             
  cmac_0             0         0         0                 0x0               0x0               0x0               0x0               0xffffffffffffffff  0x0             

AXI Stream Monitor Counters
  Stream Master                        Stream Slave                   Num Trans.        Data kBytes       Busy Cycles       Stall Cycles      Starve Cycles   
  cmac_0/M_AXIS                        networklayer_0/S_AXIS_eth2nl   48                0.832             118               0                 14              
  networklayer_0/M_AXIS_nl2eth         cmac_0/S_AXIS                  512               4.096             512               0                 0               
  networklayer_0/M_AXIS_nl2sk          PIPE                           0                 0.000             0                 0                 0               
  ccl_offload_0/m_axis_eth_tx_data     PIPE                           0                 0.000             0                 0                 0               
  sendrecv/cmd                         client_arbiter/cmd_clients_1   0                 0.000             0                 0                 0               
  ccl_offload_0/m_axis_call_ack        client_arbiter/ack_cclo        0                 0.016             9142780514        0                 9142780510      
  client_arbiter/ack_clients_0         hostctrl_0/sts                 0                 0.016             9142809039        0                 9142809035      
  client_arbiter/ack_clients_1         sendrecv/sts                   0                 0.000             0                 0                 0               
  client_arbiter/cmd_cclo              ccl_offload_0/s_axis_call_req  4                 0.240             76                16                0               
  hostctrl_0/cmd                       client_arbiter/cmd_clients_0   4                 0.240             94                34                0