Closed nspark closed 5 years ago
On yesterday's Teams WG call, we discussed adding a configuration structure and associated argument to specify the use of the team. This post is intended to continue the discussion about the fields in such a configuration structure; please pick the following apart and share your thoughts.
typedef struct {
int disable_collectives; // zero for default behavior (collectives supported);
// nonzero to disable collective support
int return_local_limit; // zero indicates library should return (to all PEs) the
// constraining value across all PEs (e.g., MIN-reduce);
// nonzero indicates library should return the constraining
// value of the calling PE
int num_threads; // # of threads that may create contexts
} shmem_team_config_t;
// The `config` argument is an input and output, and the function returns a
// status code indicating whether team creation was successful. Specifically:
//
// - On input, `config` specifies the resource and behavioral requirements of
// the team that is to be created.
//
// - If the team is created successfully, the function returns zero and `config`
// is not modified.
//
// - If the team is not created successfully, the function returns a nonzero value
// and `config` is modified to return the locally or globally constraining
// values, as determined by `config->return_local_limit`.
int shmem_team_split_strided(shmem_team_t parent_team,
int PE_start, int PE_stride, int PE_size,
shmem_team_config_t *config, shmem_team_t *new_team);
Toy example:
int max_threads = omp_get_max_threads();
shmem_team_config_t config = { .num_threads = max_threads };
shmem_team_t team;
while (shmem_team_split_strided(SHMEM_TEAM_WORLD, /* ... */,
&config, &team)) {
// Requested too many threads; loop until that is not a constraint.
if (config.num_threads == 0)
shmem_global_exit(1);
}
#pragma omp parallel num_threads(config.num_threads)
{
shmem_ctx_t ctx;
shmem_team_create_ctx(team, 0, &ctx);
// ...
}
For backward/forward compatibility reasons you want to add something like:
enum shmem_team_config_field {
SHMEM_COLLECTIVE_MODE = 1<<0,
SHMEM_LOCAL_LIMIT = 1<<1,
SHMEM_NUM_THREADS = 1 << 2
};
int shmem_team_split_strided(shmem_team_t parent_team,
int PE_start, int PE_stride, int PE_size,
shmem_team_config_t *config, shmem_team_config_field_t field, shmem_team_t *new_team);
Related example: https://github.com/openucx/ucx/blob/b1d292498cbe2c08725fbd46692494620894cd8e/src/ucp/core/ucp_context.c#L950
@shamisp I think that's a fair feature to consider. I'd generally like to make sure that a static-initialized shmem_team_config_t
structure provides the "default" settings for team creation. That way, an initializer like:
shmem_team_config_t config = { .num_threads = max_threads };
would set num_threads
as specified and imply the default for everything else. This would imply that a field mask as part of the structure itself would need to be interpreted with zero as a special value meaning "use all the fields".
I would expect that such a default setting would include (1) collectives are enabled (i.e., disable_collectives == 0
) and (2) the globally-constraining limit is returned on team-creation failure (i.e., return_local_limit == 0
), though I'm not sure what the interpretation of num_threads == 0
should be. Ideally, I think the implementation would inspect the CPU mask and interpret num_threads
as the current number of threads in the CPU mask. Unfortunately, functions like sched_getaffinity
and CPU_COUNT
are glibc
extensions for Linux; not exactly portable, though BSD and OS X have similar capabilities.
Draft PDF as of 62feca0
to close this issue (PR #35)
When it comes to whether delete is collective or not, I have no strong opinion. That said, libpsm2's endpoint shutdown code runs a lot faster when everyone is shutting down their endpoints at the same time. This may be true of other interconnects as well.
Some TODO items (or, at least, suggestions) from today's discussion:
config.num_threads
to config.max_contexts
SHMEM_MAX_CONTEXTS
env-var as helper for shmem_init
and shmem_ctx_create
TODO, from my list: shmem_team_get_config -> change return code from void to int. Potentially user may pass invalid value and function fail.
split functions - provide an option to use "parent" resources instead of allocating new resources
Current up to date pdf for this issue: main_spec.pdf
Closing this since the solution has stood for a while. We can reopen this or create new issue if this solution is rejected.
The current draft of the teams proposal puts the PE team inside the communication context and allows for the reassignment of teams to contexts. This issue is meant to detail the proposed model and potential alternatives and capture feedback or issues with any of these models as well as other interactions between teams and contexts.
Potential models for interaction
SHMEM_CTX_DEFAULT
Teams in Contexts
SHMEM_TEAM_WORLD
.shmem_ctx_set_team
).Potential advantages
Disadvantages or concerns
pSync
orpWork
state is per-PE, not per-thread).Contexts in Teams
Potential advantages
Disadvantages or concerns
shmem_team_*
-prefixed API or manual PE index translation.Teams use
SHMEM_CTX_DEFAULT
("Plan B")SHMEM_CTX_DEFAULT
).Disadvantages or concerns
Context from Teams (updated 7/23)
shmem_team_create_ctx
).shmem_ctx_create
) are associated withSHMEM_TEAM_WORLD
.Use Model
shmem_team_split_*
(collective operation)pSync
andpWrk
state lives with the team.shmem_team_create_ctx
(collective operation)shmem_ctx_create
), or the default context for RMA, AMO, collective, and synchronization operationsSHMEM_TEAM_WORLD
is the associated team for all local-semantic contexts and the default context.)shmem_team_destroy_ctx
(collective operation)shmem_team_destroy
(collective operation)Potential advantages
Disadvantages or concerns
Open questions + design points
If it is the team, then...Using distinct context handles -- even with each created fromNot a concern; this is consistent with the requirement for global serialization of collectives using the same active set in 1.4.shmem_team_create_ctx
-- may not be safe for concurrent collectives. They would need to be distinct contexts created from distinct teams.