The Teams specification introduces logical grouping for PEs. The current draft includes the use of a `WORLD' constant that defines the "all PEs" scope. This is an implicit definition that can be referred to and its purpose is useful as a method to refer to "all".
However, embedding this scope as a constant could be problematic for future OpenSHMEM capabilities, e.g., dynamics, fault tolerance, etc. For example, in MPI this implicit definition of "WORLD" raises challenges for resilience and in general the latest MPI Sessions work is attempt to move away from implicit scoping of processes.
It seems this could be avoided by adding another API call that would allow the user to create the "WORLD" group upon initialization. For example, maybe something like the following (which is very much inspired by the MPI Sessions API):
shmem_team_t world = shmem_team_group_query("shm://WORLD");
shmem_team_t self = shmem_team_group_query("shm://SELF");
ORIGINAL Example from pg. 52-53 of draft main_spec-5.pdf
1: /* ORIGINAL Example from pg. 52-53 of draft 'main_spec-5.pdf' */
2: shmem_init();
3: int npes = shmem_n_pes();
4: ...
5: shmem_team_split_strided(SHMEM_TEAM_WORLD,
6: 0, 2, npes/2, &conf, cmask, &team2):
7: shmem_team_sync(SHMEM_TEAM_WORLD);
8: shmem_team_split_strided(SHMEM_TEAM_WORLD,
9: 0, 3, npes/3, &conf, cmask, &team3);
MODIFIED Example with new Line 4 and changes at Lines 6-9
The "WORLD" is set by all the processes that call shmem_init(). In practice, the details of who those PEs are will likely be made available by the runtime (out of band info) that is used to launch the PEs and already knows about the set of processes in the parallel job. (Again, based on my understanding of how PMIx + MPI operates.)
For now we assume everyone is alive after they call shmem_init(), i.e., ignoring potential for failures (assume alive till finalize).
DISCUSSIONS:
Another suggestion was to avoid the use of 'WORLD' entirely, and use 'ROOT' for the base reference to "all" OpenSHMEM PEs. This would allow for more arbitrary stacking without the notion generally associated with 'WORLD'.
Also, the URI format ("shm://world") in above example was lifted directly from the current MPI Session work. The format could be changed to avoid the URI style, but the point is that the 'shm' namespace would be reserved. This might also allow for ways to expose platform/vendor specific enhancements, e.g., topology aware groupings. Those would be non-standard, but the form would be consistent. (Note, discussion also included possible uses for hybrid scenarios, e.g., "mpi://WORLD" might give you the group of PEs the are defined as the set of PEs ordered by their MPI ranks.
Lastly, the definition of the "shm://world" (or "shm://root") PEs would be "all" PEs that exist after the call to shmem_init(). Two points would need further clarification, but are not specific to the Teams proposal and should not delay its discussion.
The set of PEs emerging from shmem_init() is pretty clear, but what is unclear is what PEs form the collective inside shmem_init() so the resulting "WORLD" (or "ROOT") set is clearly defined. That is a refinement for the initialization phases.
The shmem_init() semantics currently suffer from the same issues of MPI's initialization, i.e., can only be called once. Again, that is not specific to Teams, nor impacted by the proposed API addition (shmem_team_group_query()).
Can these APIs be used for anything beyond querying a handle to the WORLD and SELF teams? In the absence of a "session" returned by calling init, I'm not sure benefit vs complexity tradeoff wins.
The Teams specification introduces logical grouping for PEs. The current draft includes the use of a `WORLD' constant that defines the "all PEs" scope. This is an implicit definition that can be referred to and its purpose is useful as a method to refer to "all".
However, embedding this scope as a constant could be problematic for future OpenSHMEM capabilities, e.g., dynamics, fault tolerance, etc. For example, in MPI this implicit definition of "WORLD" raises challenges for resilience and in general the latest MPI Sessions work is attempt to move away from implicit scoping of processes.
It seems this could be avoided by adding another API call that would allow the user to create the "WORLD" group upon initialization. For example, maybe something like the following (which is very much inspired by the MPI Sessions API):
ORIGINAL Example from pg. 52-53 of draft
main_spec-5.pdf
MODIFIED Example with new Line 4 and changes at Lines 6-9
The "WORLD" is set by all the processes that call
shmem_init()
. In practice, the details of who those PEs are will likely be made available by the runtime (out of band info) that is used to launch the PEs and already knows about the set of processes in the parallel job. (Again, based on my understanding of how PMIx + MPI operates.)For now we assume everyone is alive after they call
shmem_init()
, i.e., ignoring potential for failures (assume alive till finalize).DISCUSSIONS:
Another suggestion was to avoid the use of 'WORLD' entirely, and use 'ROOT' for the base reference to "all" OpenSHMEM PEs. This would allow for more arbitrary stacking without the notion generally associated with 'WORLD'.
Also, the URI format ("shm://world") in above example was lifted directly from the current MPI Session work. The format could be changed to avoid the URI style, but the point is that the 'shm' namespace would be reserved. This might also allow for ways to expose platform/vendor specific enhancements, e.g., topology aware groupings. Those would be non-standard, but the form would be consistent. (Note, discussion also included possible uses for hybrid scenarios, e.g., "mpi://WORLD" might give you the group of PEs the are defined as the set of PEs ordered by their MPI ranks.
Lastly, the definition of the "shm://world" (or "shm://root") PEs would be "all" PEs that exist after the call to
shmem_init()
. Two points would need further clarification, but are not specific to the Teams proposal and should not delay its discussion.shmem_init()
is pretty clear, but what is unclear is what PEs form the collective insideshmem_init()
so the resulting "WORLD" (or "ROOT") set is clearly defined. That is a refinement for the initialization phases.shmem_init()
semantics currently suffer from the same issues of MPI's initialization, i.e., can only be called once. Again, that is not specific to Teams, nor impacted by the proposed API addition (shmem_team_group_query()
).