Closed gmegan closed 5 years ago
A possible solution to the above problem is to have the library implementation generate a unique tag for split operations such that PE 6 could receive traffic for both split operations simultaneously and be able to differentiate the traffic by tag to determine if the traffic relates the the split into t1 or the split into t2.
A way to conceptualize this would be to think of implementing the split operations in terms of
MPI_Comm_create_group(MPI_Comm comm, MPI_Group group, int tag, MPI_Comm* newcomm))
where comm is the parent team, group describes the new set of PEs to be added to the child team, the tag is internally generated by the implementation, and newcomm is the child team
The question is: Can we generate a unique tag from the existing information passed through the split API?
The split operation provides the following information that could be used for tagging:
Is there a case where all of this information is identical between subsequent calls to split, such that any tag created from this information would be identical between subsequent calls to split?
If we assume team split is a synchronizing operation, then in any case where PE n is involved in a split into a child team with some (parent, start, stride, size), we know that all other PEs involved in that split must enter the split before any can exit. It seems like this would take care of the situation of subsequent splits into the exact same set of PEs.
For example:
#pragma omp parallel
#pragma omp for
for (i=0; i<omp_num_threads(); i++) {
if (omp_get_thread_num() == i) {
shmem_team_split_strided(SHMEM_TEAM_WORLD, 0, 2, shmem_n_pes() / 2, NULL, 0, &td_team);
}
#pragma omp barrier
}
This code creates teams inside of threads that contain all of the threads with the same thread ID across SHMEM_TEAM_WORLD. There are subsequent splits into overlapping child teams from SHMEM_TEAM_WORLD, but the splits are synchronizing across all PEs in the child teams. So, for example, the team for thread ID 1 will not be entered by PE 0 until all PEs have entered the split to create the team for thread 0. With the synchronizing guarantee it seems like the implementation can ensure that no even PE exits the split until all all even PEs are ready for the next split.
If this is the case, then the synchronization problem only comes up for subsequent splits into overlapping teams with different membership, as in the example above about splitting into teams based on multiples of 2s and 3s.
What is the minimal amount of information needed to generate a unique tag? Does it fit into an int?
If we keep the option to design a unique tag generation out of discussion for while, the main issue is that the tags are communicated via the same buffer. So, how do we avoid messages overlapping each other from different split operations? Aren't there a possibility of missing a message and resulting in deadlock? Are we supposed to perform an internal barrier with the participating child team PEs before performing a split operation? If so, what is the safe way to perform this barrier operation?
To me tag generation is a secondary issue. I don't follow the main flow of this algorithm.
On completely overlapping teams, may be my concern doesn't matter. But, with partially overlapping child teams - my previous concern seems to be an issue.
Will reopen new issue with better problem statement
Currently, it is the case that:
An example of a violation of the final condition is:
This code creates a case where PE 6 could still be involved in the split resulting t1 and simultaneously receive traffic from PE 3 related to the split resulting in t2
It would be preferable to remove the third split condition such that the above code would be legal.