Closed zhenhaohe closed 1 year ago
@zhenhaohe how many ranks were used in this benchmark?
This is 8 ranks
The latency charts in the paper are mostly on 4 ranks. Could you please post 4-rank charts as well?
Closing, this was found to be due to incorrect setting of count parameter in ACCL calls
The performance of allgather, gather and scatter is worse than sw openmpi on large message size, which was not the case shown in the paper. See the attached performance plot below.