Sandia-OpenSHMEM / SOS

Sandia OpenSHMEM is an implementation of the OpenSHMEM specification over multiple Networking APIs, including Portals 4, the Open Fabric Interface (OFI), and UCX. Please click on the Wiki tab for help with building and using SOS.
Other
61 stars 53 forks source link

Teams: allow split stride of zero iff size is 1 #1136

Closed davidozog closed 2 months ago

davidozog commented 2 months ago

This PR allows team split operations to have a stride of zero if and only if the new team size is 1.

davidozog commented 2 months ago

@wrrobin @philipmarshall21 - I reverted the zero-stride checks across collectives (https://github.com/Sandia-OpenSHMEM/SOS/pull/1136/commits/998f7c9b5d9159a20efd24418438af1704e83577) by forcing stride=1 internally, which I think is a little cleaner. Can you please re-review?

(Also this updates the SOS tests-sos submodule to point to the latest commit).

davidozog commented 2 months ago

LGTM. So, internally, we are not allowing stride to be 0. Perhaps, we should discuss if we need to revisit this in OpenSHMEM.

Yes, agreed. Fortunately I think the spec is arguably well-defined if the following is true: shmem_team_split_strided allows stride to be zero only if size is 1 (but stride can also be any value when size is 1). So it's up to library implementers to handle the special zero-stride case, because it's likely to lead to divide-by-zero exceptions.

I've also drafted a change here that should help clarify this in the spec: https://github.com/davidozog/openshmem-specification/commit/010e42489d89e8088dea84c447d283e04651afd3

I will propose it for v1.6 if there's enough time...