Closed edwardsmith999 closed 8 years ago
I don't get this. So the purpose of split is to create a new communicator that contains only the CFD or MD processes depending on the realm it is called. If you are connecting one Python (2 calls to split) with the OpenFOAM (1 call to split) the effect is that on the python side you are using the communicator created at the python level while in the OpenFoam you are used the one returned by cpl. There will be one communicator created not used (the one done by cpl on the python side) but I cannot see how this could hangs the run. You can split your MPI_COMM_WORLD as many times as you want but only use a certain number of the new comms created. Regarding that thread, I actually use that in the wrapper but the problem is how to create a Communicator python object from a Communicator ID returned from the library! There is no option as far as I know to do that in mpi4py.
MPI_COMM_SPLIT is blocking and is called on all of MPI_COMM_WORLD (say 4 proces). The first call in cpl library is matched in both python (2 procs) and OpenFOAM (2 procs). The next call is on the 2 python procs only but will block until all 4 processes (OpenFOAM and python) have called it.
There are three solutions I can see: 1) One solution is to call another matching split again in OpenFOAM. 2) Find a way to set the handle of an mpi4py comm when you instantiate it. You're right that the link doesn't solve this. There is a MPI.Comm() command which creates a comm but doesn't take arguments and a range of ways to set attributes and data of comm types (none of which seem to let you specify the associated handle. 3) Use MPI_Spawn to create the two runs as appropriate and avoid using split altogether.
I do understand now. There is an extra way of managing this, that maybe is cleaner. As it is necessary to pass a pointer/reference to an communicator handle (int or MPI_Comm type) to md_init and cfd_init, it can be used as a flag to split or not MPI_COMM_WORLD inside the library. So if we want to split it, we assign MPI_COMM_NULL to the passed handle and if it's different, it means we have already split it when we call *_init functions (in python wrapper). It would be actually adding just one conditional sentence to the library code. It is a bit ugly since one half of the processes block inside the library and the others outside. But hey, it is just a work around for the python wrapper, it has not even need to be documented and just say "you must initialize to MPI_COMM_NULL the handler you are passing to _init function".What do you think?
On the other hand the Spawn solution would be in the long term maybe? the way to go. I guess a top-level python script spawning processes is what you have in mind. But then the creation of the overlap region with processes of both realms and also the creation of the intecommunicator would need a rework. Am I right?
I like the idea, although we need to change the COMM from a return only variable. This has the unexpected consequence that if the use passes an unspecified variable, it could be used as a comm or cause an error. I think the best solution would be to call some form of test to see if the comm actually exists already, for example something like MPI_Comm_test_inter
Okay, thanks to a reply on the mpi4py forum, there is a way to return the handle of a communicator and build an mpi4py object using this. The key trick here is to create a dummy newcomm=MPI.Intracomm()
, get the pointer to the comm address with newcomm_ptr = MPI._addressof(newcomm)
and the current value comm_val = c_int.from_address(newcomm_ptr)
and override this value with the returned comm from cpl-library comm_val.value = returned_comm_handle.value
The extra split called in cpl.py python bindings in the wrapper for create_comm, MPI.COMM_WORLD.Split(calling_realm, MPI.COMM_WORLD.Get_rank()) cause the code to freeze because MPI_COMM_WORLD is shared by both coupled codes and there is no matched call. This will only work if the other coupled code (c++/fortran) calls split twice!
Can't we return the split comm to python using the solution under: https://groups.google.com/forum/#!topic/mpi4py/jPqNrr_8UWY