Closed gmegan closed 5 years ago
Related to this... in the description of shmem_collect over an active set, there is a note that collectives over a non-power-of-2 strided active set will experience performance degradation. It seems like this note should be also be given for teams created with split_strided and a non-power-of-2 stride if it is applicable.
At the WG today, it seemed as though the general consensus is that the NOCOLLECTIVE option creates a "team" which is really just a local arithmetic remapping of some set of PE numbers. In other words there is fundamentally no communication to be done, as these are just convenience objects.
As convenience objects, these might be useful. For example if 3D split is changed to two calls to 2D split, and the resultant teams could have separate options specified, as in:
team_split_2D(team64, -noncollective option teamx-, -default option teamz-, 4, teamx, teamz) team_split_2D(teamx, -default option teamx-, -default option teamy-, 4, teamx, teamy)
This way, the first teamx that gets created, when it contains x*y, there is no communication overhead to form it, and teamz gets created as normal. Then, the second time teamx gets created, it ends up as a normal team.
Regarding teams that support barrier but not other collectives, this would mean making teams that have an associated pSync, but not an associate pWrk. I think this would have a different option though, some kind of NOREDUCTION option to reflect the lack of pWrk.
This option has come up as it relates to synchronization requirements on team creation (issue #33).
Essentially, NOCOLLECTIVE was originally added to provide the ability to make teams with low to no overhead for point-to-point only.
However, due to the need to resolve context/team creation for two sided transport, point-to-point communication within a team is no longer possible without team creation specifying contexts > 0, which will be higher overhead operation on some networks. Further, to maintain portability across systems and implementations, barriers are introduced between team creation events for contexts > 0 if teams have overlapping membership. When team based heaps are added, barriers between splits will be required even if contexts = 0.
So, if the goal is to provide some option that eliminates all synchronization requirements for team creation going forward, it makes more sense to define some option that says the team will only be used in local functions like translate, and as a parent in future split operations.
This would not solve the problem of barriers between creation calls that are required on some implementations (e.g. need to wire-up) and not others (e.g. datagram).
Closing this issue as the problem space has changed due to the way that team creation, team contexts, and collective operations have evolved. If this comes back up, a new issue can be opened with better description of the problem in the new context.
As per Nick's, comments on the library constants section, teams with the ability to barrier but not do collectives would be more useful than teams without any ability to do either.
We will discuss at the next WG and potentially change the definition of this option.