Closed naveen-rn closed 6 years ago
On Team destroy operation:
- Irrespective of how the teams are created with or without SHMEM_TEAM_NOCOLLECTIVE - this call is non-collective
I think team_destroy
is collective by default. It is only non-collective when the team has the SHMEM_TEAM_NOCOLLECTIVE
option.
I am trying to find the best place to put this explanation in the spec document. It seems that it should go in the Teams section intro text since it mostly applies to team create/destroy in that section, rather than putting it in the collectives section.
team creation call is collective only across participating PE - every other participating PE will return SHMEM_TEAM_NULL
@nspark @shamisp Can you clarify whether this is the right semantics. I'm bit confused on how to design this semantics efficiently.
Can we create two teams (TEAM:1 and TEAM:2) concurrently from TEAM_WORLD? Consider that the team creation operation is ordered correctly in PE-1.
TEAM_WORLD: PE={1, 2, 3, 4, 5, 6, 7, 8} TEAM:1 PE={1, 2, 3} TEAM:2 PE={1, 4, 8}
@naveen-rn In your example, TEAM:1 and TEAM:2 are not disjoint, so they cannot be created concurrently.
I'll tweak your specific PE assignments to simplify my code logic and so I can use shmem_team_split_strided()
, but here are two instances of your example in code:
// Goal: team1 = {0, 1, 2}, team2 = {0, 3, 6}
shmem_team_t team1 = SHMEM_TEAM_NULL;
shmem_team_t team2 = SHMEM_TEAM_NULL;
{ // Method 1
shmem_team_split_strided(SHMEM_TEAM_WORLD, 0, 1, 3, &team1);
shmem_team_split_strided(SHMEM_TEAM_WORLD, 0, 3, 3, &team2);
}
{ // Method 2
if (shmem_my_pe() < 3)
shmem_team_split_strided(SHMEM_TEAM_WORLD, 0, 1, 3, &team1);
if ((shmem_my_pe() % 3) == 0)
shmem_team_split_strided(SHMEM_TEAM_WORLD, 0, 3, 3, &team2);
}
In both cases, PE 0 -- the only PE in both teams -- creates team1
before team2
; however, PEs 1, 2, 3, and 6 are all trying to concurrently create a team with PE 0, which introduces the conflict or race between these concurrent operations. The simplest solution would be to call shmem_barrier_all()
between the (conditional) creation of the two teams.
The above is a problem only when both team1 and team2 are intended for collectives ?
@spophale No, I think it's still a problem. Originally, I think the intent was that shmem_team_split(NOCOLLECTIVE)
would be a local operation, but, with the team-context interaction, I think there is a collective nature to team creation even when the team will not itself be used for collectives. The NOCOLLECTIVE
option then serves the purpose of reducing use of internal resources (e.g., internal pSync
and pWork
structures).
Even in respect to team-context interaction we may potentially avoid global synchronization if the system is symmetric or and datagram is used. @spophale there is no way around with any solution unless you allocate context as a collective operations which is leads to all kinda other constrains.
Technically, the mandate of team proposal was extended to address issues related to context creation with point-to-point transports. So it is teams + context proposal :)
Just posted PR #41 for this issue.
PDF: main_spec.pdf
There is a lot of text to deal with the special case of teams that don't communicate. This happens when one creates a team with SHMEM_TEAM_NOCOLLECTIVE and 0 contexts. The main case where this will happen is in doing some number of 2D splits where there are intermediate teams that do nothing but get split again.
It seems to make more sense to define some option like SHMEM_TEAM_STUB to encapsulate this behavior? As it is, the user would have to track down all of the conditions that trigger need to synchronize. Currently that is just contexts and collective support, but when teams have heaps or other shared structures this will not be backward compatible.
Fixed typo in PR #41
Merged PR, closing this issue
Let me know, if my understanding is correct.
On Teams created with SHMEM_TEAM_NOCOLLECTIVE:
On Teams created without SHMEM_TEAM_NOCOLLECTIVE:
On Team destroy operation: