Handling shmem_team_create_ctx failures

naveen-rn commented 5 years ago

shmem_team_create_ctx is a new API we created in this proposal to add team-based context creation. There are multiple reasons why this routine could fail:

Invalid team argument
Exceeding the maximum number of resources
Unavailability of the user requested context options

How should these failure be handled? Does all these failures return non-zero as return value? or do we return non-zero only for (2) and (3), while (1) aborts.

gmegan commented 5 years ago

As it is currently stated, this is up to the implementation as to whether to abort or return nonzero. The reasoning is that it may not be possible for some implementations to check for these conditions, or it may be a performance problem to do so.

It is also currently permitted to have different behavior in either debug or release builds.

Perhaps the guidance needed here is to define nonfatal errors as those conditions where the routine call can exit and leave the library in a usable state, and these should return nonzero. However, the specification cannot define what those conditions are due to performance and other real world constraints. So, generally I would consider (1) will cause an abort or other undefined behavior because checking every team argument becomes expensive. But I might have a slower debug mode that tracks all of the team arguments and returns an error code for invalid teams.

naveen-rn commented 5 years ago

I'm confused, I don't understand how a DEBUG or PERFORMANCE mode can alter the fundamental semantics. In DEBUG mode implementations can return more logs and specify why it failed. But, the mode of operations can't determine the fundamental behavior of a routine.

That said, how are we handling the scenario - team-based collective with invalid team argument. I suppose it a fatal error. Shouldn't be the same here and let the spec say so?

naveen-rn commented 5 years ago

The reasoning is that it may not be possible for some implementations to check for these conditions, or it may be a performance problem to do so.

Can you please elaborate on the performance issue? Also, how come implementations can't decide on whether the passed team-object is valid?

gmegan commented 5 years ago

Resolve with PR #78

gmegan / specification

Handling shmem_team_create_ctx failures #68