Xilinx / ACCL

Alveo Collective Communication Library: MPI-like communication operations for Xilinx Alveo accelerators
https://accl.readthedocs.io/
Apache License 2.0
81 stars 26 forks source link

99 stream put uses inconsistent stream #111

Closed quetric closed 2 years ago

Mellich commented 2 years ago

Every test seems to work and looks good to have a zero-based stream ID range for the API now. But couldn't this lead to problems when using the stream IDs in user kernels?

For example in this test. I changed the send/recv into a stream_put and now the received stream ID will be different from the one provided in the stream_put so the test will hang.

Mellich commented 2 years ago

Works now!

Maybe we could also add the change of the loopback test to this PR, so we also have a test that proofs that the stream IDs work as expected? Would be this commit 6e9ae6df5bd009c5bcaa282213bee9e325167e87.

Mellich commented 2 years ago

I rebased #94 on this PR and tried to run the stream_put from to same destination rank. The stream ID does still not match in this case. See b5ee6615b909e020e3aeba02b697e2f0936d8220 test_loopback_local_res in HLS tests.