Open rkowalewski opened 6 years ago
An interesting use case for dart_Ialltoallv
is dash::sort
where all processors have to distribute local data to the corresponding target processors. This can be overlapped with local copy operations.
I think there was once of student from TUM who had a dart_alltoall
implementation. I will look into that
DART should provide operations to support both blocking and nonblocking all-to-all operations.
MPI_Alltoall
andMPI_Alltoallv
MPI_Ialltoall
andMPI_Ialltoallv
Currently the only way to transpose an (N-)Array is to apply one-sided operations. This is, of course, not scalable with a large number of processes where logarithmic complexity is imperative, compared to linear complexity in the "naive" transpose.