Open lcebaman opened 4 years ago
That message suggests MPI cannot find a region in the target window with offset==0 and size==(extent of target_datatype) * (target_count)
Is the offset correct? Has any memory been attached by the target process? Make sure the target process calls MPI_WIN_ATTACH with some allocated buffer space.
is the size correct? How big is the bit of memory attached to the window by the target process? Print out the size used in the MPI_WIN_ATTACH call and check it against the size needed (use MPI_Type_getextent or similar to find the extent of target_datatype for the calculation of needed size).
If the target process attaches memory of a suitable size, then we need a deeper investigation!
It turns out that handle->win->w_osc_module->osc_get(buf1, trygetargs.origin_count, trygetargs.origin_datatype, trygetargs.target, 0, trygetargs.target_count, trygetargs.target_datatype, handle->win)
uses a default 0 displacement. This is wrong as a displacement is required.
Ah, we're using a dynamic window, right? That means the "base" address is MPI_BOTTOM (typically address 0, in kernel space!) and all "displacements" are actually memory addresses - that is, displacements relative to MPI_BOTTOM.
The internal computation will be something like:
address_of_region = base_of_window + disp_given_by_user
For a non-dynamic window the "base" for the window is the memory address of the first allocated byte in the window, so "disp" is really a displacement within the window, i.e. disp=0 means "the first byte in the window".
For a dynamic window the "base" for the window is always 0, so a "disp" of 0 is always well outside the window region.
--
Fix: you will need to know the remote address for the "base" of the window, i.e. the memory address of the chunk of memory that the remote process gave to MPI_WIN_ATTACH.
You could do an MPI_ALLGATHER to get all of the addresses of all chunks from all processes, but that is massively overkill. You only need the addresses from the processes that this process will MPI_GET data from. For our tree algorithm that means each process only needs at most 3 remote addresses (from the 2 children and the 1 parent).
Obtaining those base addresses should be done as part of the initialisation of the persistent collective operation. I would do it via point-to-point, I think. Use nonblocking point-to-point so the sends/recvs can be added to the schedule. Add those steps to the schedule before the restart point in the schedule, so they only get done once when the operation initialises, not every time the operation is (re)started.
Currently, this is what we do:
/* Create dynamic window */
res = ompi_win_create_dynamic(&info->super, comm, &win);
....
/* Attach window to tmp sendbuf */
res = win->w_osc_module->osc_win_attach(win, tmpsbuf, count*size);
.....
/* MPI Get_address */
MPI_Get_address (tmpsbuf, &disp);
....
/* create an array of displacements where all ranks will gather the value*/
disp_a = (MPI_Aint*)malloc(count * sizeof(MPI_Aint));
....
/* Gather ALL displacements-> pt2pt call */
res = comm->c_coll->coll_allgather(&disp, 1, MPI_INT, disp_a, count, MPI_INT, comm,
comm->c_coll->coll_allgather_module);
Which I think is right?, but disp_a
is never used later on!
The allgather
is overkill (you don't need all those values, and the memory requirement for storing them scales linearly with the number of processes, rather than being fixed by the number of direct neighbours in the protocol pattern, i.e. 3 in our case). Also, this looks like a blocking allgather
, which would ideally (eventually) be replaced with nonblocking something.
However, yes - this disp_a
will contain the values you need to fix the "region not found" error.
I agree, we do not need all disp
, but which ones do we need?
The ones for the parent process and the two children processes. Deal with that later :)
First fix the region not found by using disp_a[trygetargs.target_rank]
(hopefully trygetargs.target_rank
is already set correctly somewhere early on, e.g. during initialisation of the operation?)
Thank you for taking the time to submit an issue!
Background information
When calling (pnbc_osc.c):
MPI cannot find a region (osc_rdma_dynamic.c)
Details of the problem
Error calling
MPI_Get
inside the Allreduce algorithm.