Open gmegan opened 5 years ago
Can we say that the root cause of this issue is mixing NO_COLLECTIVE and normal teams in the hierarchy of team creation?
I think yes, mixing these team types is the root of the problem. But we have cases where we want this, e.g. using multiple 2D splits to make a 3D split.
Right now a team created with NO_COLLECTIVE and 0 contexts does not require sync between team create. But when we add heaps, that will change this requirement.
This is maybe a small point, but even when we add heaps to teams, the heap operations—creation, allocation, deallocation, etc.—will be collective across the team. In that way, I think a team with NO_COLLECTIVE
would not be suitable for use with team-based heaps.
This is a good breakdown of the problem, @gmegan. I wonder now whether we should consider a "hybrid" approach with the alternative Sets and Groups proposal from ORNL and @manjugv. My main concern with Sets and Groups was the generality of set creation, and the storage/indexing overheads that might induce (along the lines of color/key-split). I did have a secondary concern of API verbosity, but this issue makes it seem like it might be necessary (or preferred to the complexities here).
As a "hybrid", I'd suggest only providing:
shmem_set_split_{strided, 2D}
SHMEM_{SET, TEAM}_{WORLD, SHARED}
Admittedly, there will be cases in which it will feel somewhat verbose to, for example, extract the set from a team, split the set, then create a child team, but I wonder whether this complexity is worth resolving this issue.
Yes, this would be quite a bit of churn in terms of LaTeX changes, but, would it solve the issues raised here?
@nspark @gmegan It looks like there are two separate problems being discussed in this issue.
NO_COLLECTIVE
and COLLECTIVE
team in the hierarchy of teams creation; andProblem(1) can be resolved by resisting the mix of these two types of teams. Two me this multiple 2D splits to a 3D split seems to be an one-off use case. If required, we should create a separate new API for 3D split.
Problem(2) is an implementation based requirement, since we moved away from user passed psync arrays and depend entirely on internal psync arrays. If implementations have support for 2-sided/active-set based operations internally in their comms layer - there is no need for this synchronization expectation. We might(!) have difficulties when we introduce team-based SHEAP, but still it looks solvable. If implementations use RMA to implement team-creation operations, then it looks like there are no workarounds. I don't see how sets/groups based PE-subset formation can resolve this problem.
As per ongoing discussion, team creation sync requirements currently look like this:
If all child teams are NO COLLECTIVE, the split operation is local only and no sync is required
If any child team supports collectives, the operation is collective across child team
If a team is the parent argument to split operation multiple times and child teams overlap and support collectives, sync on parent team between split operations is required
teams supporting collectives cannot be created from NO COLLECTIVE teams
NO COLLECTIVE teams cannot create contexts (num_contexts = 0 is only valid config)
parent team can be destroyed before child team, i.e. team creation hierarchy does not impact team destruction ordering
NO COLLECTIVE teams cannot create contexts (num_contexts = 0 is only valid config)
This statement seems bit miss-leading. Based on my understanding from IB perspective,
Correct me if my undertanding is incorrect.
This statement seems bit miss-leading. Based on my understanding from IB perspective,
Followup from the OpenSHMEM WG discussion. Though this seems implementable - based on Manju's statement - it looks there is performance impact in maintaining the progress thread either in the implementation or someway through hardware.
NO COLLECTIVES teams are removed from the current draft for now. They can be revived later once we can address the issues around how they can be used for p2p under the current teams/contexts model. As the draft stands, it will be straightforward for implementations to add NO COLLECTIVES or other team types as experimental features by expanding the team creation config.
The section at the end of the teams intro about synchronization between team creation events is confusing and convoluted.
The requirements to avoid synchronization between team create events are too esoteric. Right now a team created with NO_COLLECTIVE and 0 contexts does not require sync between team create. But when we add heaps, that will change this requirement. Propose to put a config option NO_RESOURCE that makes a team that is a numeric mapping only and any config options for contexts, collectives, heaps, etc are ignored.
The requirement to sync on ancestor team when teams are created from a NO_COLLECTIVE team parent is problematic. It means there is an induced graph of team inheritance that the user has to care about and that the implementation has to enforce. Overall, the creation of new teams from arbitrary sets of PEs needs as much automatic work from the library as is feasible, otherwise removing pSync and pWrk will just punt the problem into these convoluted synchronization requirements.
The issue is when this sequence of operations is executed:
So teams X, B and C both need to do some operations to setup symmetric resources for collectives. This requires using some existing symmetric resources to bootstrap the new symmetric resources. Neither teams A or Z has any symmetric resources to bootstrap team creation, so the only resources to bootstrap new teams are the global symmetric resources.
So, as is, the user is going to end up putting sync_all between every split from a NO_COLLECTIVE team to make sure all global symmetric resources are protected.