Xilinx / ACCL

Alveo Collective Communication Library: MPI-like communication operations for Xilinx Alveo accelerators
https://accl.readthedocs.io/
Apache License 2.0
81 stars 26 forks source link

Application stuck on receive #113

Closed Mellich closed 1 year ago

Mellich commented 1 year ago

Application b_eff works in emulation and simulation as expected. ACCL XRT tests succeed with the bitstream (except streaming send/recv which is not used by the application). Application hangs on first receive when executed on hardware.

Run run_mpi.sh in following directory: /proj/xlabs_t3/users/mariusm/runs/2022-10-20-b_eff_accl_test_u55c

Application repo: https://gitenterprise.xilinx.com/mariusm/HPCC_FPGA.git Bitstream: /proj/xlabs_t3/users/mariusm/synth/benchmarks/b_eff/u55c_pl/build

Mellich commented 1 year ago

Some IP addresses of the ACCL ranks were wrongly set.