Xilinx / ACCL

Alveo Collective Communication Library: MPI-like communication operations for Xilinx Alveo accelerators
https://accl.readthedocs.io/
Apache License 2.0
81 stars 26 forks source link

Fix ZMQ subscription bug for ranks > 9 #174

Closed Mellich closed 10 months ago

Mellich commented 10 months ago

The ZMQ subscription for emulator/simulator does not correctly work for more than 9 ranks, which causes the issue described in #172. ZMQ seems to compare the subscription string with the first characters of a message. If the subscription is for rank 1, it also matches messages for rank 10,11,... Not sure if this is a zmqpp bug or installation specific since I did not notice this issue before. This PR fixes this issue by introducing zero-padding for the destination rank in the ZMQ messages as well as the subscription string.

quetric commented 10 months ago

Thanks @Mellich for this fix. I believe this is our bug and not related to zmqpp, as per this answer here: https://stackoverflow.com/questions/58396040/zeromq-cppzmq-subscriber-with-filters-which-start-with-the-same-string