gmegan / specification

OpenSHMEM Application Programming Interface
http://www.openshmem.org
1 stars 0 forks source link

Topology Teams Break Internal Representation #131

Open jdinan opened 5 years ago

jdinan commented 5 years ago

Problem

The SHMEM_TEAM_SHARED team is defined to contain all of the PEs that will return a non-null pointer from shmem_ptr.

Many (most?) SHMEM implementations get the PE ID to process mapping from the launcher. Many launchers allow the user to specify the PE ID mapping. Thus, the PE IDs in a node may not fit the desired linear translation function with <start, stride, size> parameters.

Implementations must therefore choose one of these options:

  1. Restrict shmem_ptr to PEs that can be captured in SHMEM_TEAM_SHARED <start, stride, size> mapping.
  2. Support a general (nonlinear) mapping so SHMEM_TEAM_SHARED can capture all PEs accessible via shared memory. Note that SHMEM_TEAM_SHARED is valid for collectives, team constructors, and context constructors (i.e. in point-to-point operations). Thus, this team representation would need to be supported in many API routines.
  3. Abort if shared memory PEs don't fit linear mapping.

Proposed Change

This is a general problem for any topology-derived teams.

I suggest that we relax the specification to state that PEs in SHMEM_TEAM_SHARED must be accessible via shmem_ptr, but that it need not include all PEs that are accessible via shmem_ptr.

naveen-rn commented 5 years ago

My 2 cents.

The information about rank reordering (or job placement in general) is not being passed in a standard way into the OpenSHMEM implementation. If I'm correct (please correct me if this understanding is wrong), even in MPI there is no standard way to pass this information. Implementations that I have used, receives through an implementation specific environment variable. Even PMI doesn't have this standardized.

Hence, to me this shouldn't be a reason to define a SHMEM Team semantics. The randomness argument between color split and team shared is also bit out-of-context. To me the randomness in color split is introduced in the specification, while the team shared in this case is introduced by the implementation. It should be perfectly valid that OpenSHMEM implementation never supports rank placement.

Instead we should define team shared as it is and then just let the implementation handle it effectively.

jdinan commented 5 years ago

@naveen-rn PMIx is the first I am aware of the process manager providing this information through a standard interface. PMIx does assign ranks, but as you correctly noted, the middleware is free to renumber the processes. While it's great that there is an open/standard interface for this, it's new and not yet widely adopted. And, of course, most implementations already have a solution that's working for them.

I'm fine with leaving this to implementations to deal with. But, I do think it's a subtle enough issue that it merits an advice to implementors to capture what the authors of the specification intended and how to deal with the corner case.