Xilinx / ACCL

Alveo Collective Communication Library: MPI-like communication operations for Xilinx Alveo accelerators
https://accl.readthedocs.io/
Apache License 2.0
81 stars 26 forks source link

A soft reset routine for the CCLO should be introduced and applied during initialization #176

Closed mar-ven closed 8 months ago

mar-ven commented 8 months ago

Currently, the ACCL initialization routine on the host verifies that CCLO_ADDR::CFGRDY_OFFSET equals 0, before performing any other steps, and throws an error otherwise. In case of software crashes occurring after the initialization stage and before the execution of the operation, e.g., for a copy, the value of CCLO_ADDR::CFGRDY_OFFSET is different from 0, and any other attempts to re-initialize the CCLO fail. It is therefore necessary to introduce a soft reset routine, invoked during initialization, to set it back to 0 and allow any other configuration steps to occur, without throwing any exceptions.