Sandia-OpenSHMEM / SOS

Sandia OpenSHMEM is an implementation of the OpenSHMEM specification over multiple Networking APIs, including Portals 4, the Open Fabric Interface (OFI), and UCX. Please click on the Wiki tab for help with building and using SOS.
Other
61 stars 53 forks source link

SHMEM_TEAM_SPLIT_STRIDED expects global stride instead of relative stride? #1151

Open zhongchen530 opened 5 days ago

zhongchen530 commented 5 days ago

The function SHMEM_TEAM_SPLIT_STRIDED takes in a parent team, start, size, and stride argument to produce a new team. I would expect in this case the stride to be relative to the parent team. However, what I observed was that it expects the stride to be relative to the SHMEM_TEAM_WORLD instead.

For instance if number of PEs is 6, meaning SHMEM_TEAM_WORLD : {0,1,2,3,4,5}.

shmem_team_split_strided(SHMEM_TEAM_WORLD, 0, 2, 3, NULL, 0, &new_team);

would result in

new_team : {0,2,4} where numbering is relative to SHMEM_TEAM_WORLD

If I again call

shmem_team_split_strided(new_team, 0, 2, 2, NULL, 0, &another_team);

this would result in

another_team : {0, 2} where numbering is relative to SHMEM_TEAM_WORLD

The stride is still relative to the SHMEM_TEAM_WORLD team, not the parent team new_team passed as argument into the function. If the stride was relative to new_team, we would expect {0,4} instead.

Is this an intended behavior or is it a bug?

All numbering used to denote a PE is relative to SHMEM_TEAM_WORLD.

davidozog commented 5 days ago

What a great question @zhongchen530 - thank you.

I believe the SOS implementation is correct, but I understand the confusion. Here is how I think of the team split semantic in terms of your example above:

Yes, the split of SHMEM_TEAM_WORLD into new_team results in the following global PEs making up new_team:

new_team : {0,2,4}

However, the OpenSHMEM specification says the following:

PEs in a newly created team are consecutively numbered starting with PE number 0. PEs are ordered by their PE number in the parent team. Team-relative PE numbers can be used for point-to-point operations through team-based contexts...

However, within the new_team, the team-relative PE numbering is actually:

new_team : {0,1,2}

The specification example for shmem_team_create_ctx illustrates how this works, and kinda how it can be useful.

As a example for demonstration, if you did the following:

shmem_team_create_ctx(new_team, 0, &new_ctx);  // assume this is successful
if ( shmem_team_my_pe(new_team) == 1 ) {
    shmem_put(new_ctx, dest, source, nelems, 2);     // This means global PE 2 puts to PE 4 on the world team!
}

So with respect to the world team indexing, PE 2 does a put to PE 4. But with respect to the team-relative indexing, PE 1 in new_team is doing a put to PE 2 in new_team.

This means that in your example, the (start, stride, size) split of (0, 2, 2) will result in:

another_team = {0,4} with respect to the world team, but actually within team-relative numbering it's still:

another_team = {0,1}

Does that help?

Please let me know if you know of any other OpenSHMEM implementations that do not work like this... I believe this is the correct behavior, but there may be room for improvement in how the specification explains this.

I'm adding a some more eyes who might be interested and can ensure I'm not mistaken: @wrrobin @lstewart @wokuno

davidozog commented 5 days ago

@zhongchen530 - Oops! I had to write all that to see that maybe SOS does not behave how I described, and you found an issue. Will investigate..

zhongchen530 commented 4 days ago

@zhongchen530 - Oops! I had to write all that to see that maybe SOS does not behave how I described, and you found an issue. Will investigate..

Yes, your explanation above is what I initially expected, but it doesn't behave that way. Instead, another_team is observed to be {0,2} instead.