Open MarcelKoch opened 8 months ago
Looks good to me. But there is an issue with OpenMPI that needs to be resolved. It seems that OpenMPI versions < 5.0.x still have this bug
I'm not sure what you mean here. I already have this workaround to use non-null pointers, which should fix the issue.
The null test is still failing here. I thought that was because of the null arguments ?
Thanks for bringing that up. I guess I've not looked really at the tests in quite a while. Just a remark to the OpenMPI issue you linked, that one is for the persistent all-to-all-v call, which I'm not using (yet). So the issue shouldn't be relevant to the test failures.
So right now, I'm leaning to just deactivate the neighborhood communicator if the openmpi version isn't sufficient. The CollectiveCommunicator
interface is meant to provide different implementations of the all-to-all call, so using a dense all-to-all here feels odd. I think I would rather have a derived class that only does the dense all-to-all and use that instead.
Yes, but the fix is missing in older versions of 4.0.x as well. I am not entirely sure which exact version of OpenMPI our container uses, but it is 4.0.x, and 4.0.7 is still missing the fix. I think the fix was added only in 4.1.x.
But in general, I agree. I think we can just disable the neighborhood communicator for older versions of OpenMPI.
@upsj I think the moved-from that should be using MPI_COMM_NULL
as communicator. Then trying to launch a communication will fail quite badly.
This PR adds a communicator that only handles neighborhood all-to-all communication. It implements the new interface
collective_communicator
, which provides different implementations for a selected set of collective mpi routines. Currently, this only includes the non-blocking all-to-all.The communication uses a fixed pattern, i.e. the send/recv sizes are fixed when the neighborhood communicator is constructed. I would have liked to decouple that, but this would require some knowledge of how the sizes are stored at the interface level. If someone has an idea for that, please let me know.
This is the first part of splitting up #1546.
The neighborhood all-to-all has a bug in OpenMPI < v4.1.0, which makes it necessary to disable the neighborhood communicator in this case. As replacement, there is also a dense all-to-all communicator.
Todo:
PR Stack: