gmegan / specification

OpenSHMEM Application Programming Interface
http://www.openshmem.org
1 stars 0 forks source link

Teams creation inside parallel region #32

Closed naveen-rn closed 5 years ago

naveen-rn commented 6 years ago

Lets assume we create a team which would be used for collectives.

Are we looking to create this team inside a thread parallel region? Does this feature currently supported in the draft write up?

naveen-rn commented 6 years ago

@manjugv ?

nspark commented 6 years ago

Are we looking to create this team inside a thread parallel region?

I think it should be allowed to create a team from the non-main thread.

Does this feature currently supported in the draft write up?

I don't think there's anything in the current draft that precludes this.

nspark commented 6 years ago

Here's a detailed version of the example I raised on today's call discussing this issue: Each PE creates two threads. On all PEs congruent to 0 mod 2, Thread 0 creates such a team. Similarly, on all PEs congruent to 0 mod 3, Thread 1 creates such a team. All then PEs have 0, 1, or 2 teams.

One example of multithreaded team creation that I think should be erroneous:

#pragma omp parallel num_threads(2)
{
  shmem_team_t team_mod = SHMEM_TEAM_NULL;
  shmem_team_t team_mod = SHMEM_TEAM_NULL;

  switch (omp_get_thread_num()) {
  case 0:
    if (0 == (shmem_my_pe() % 2))
      shmem_team_create_strided(SHMEM_TEAM_WORLD, 0, 2, 2, /* size */, &team_mod2);
    break;
  case 1:
    if (0 == (shmem_my_pe() % 3))
      shmem_team_create_strided(SHMEM_TEAM_WORLD, 0, 3, 3, /* size */, &team_mod3);
    break;
  }

  /* ...use the teams... */
}

The rationale on why this is erroneous is that for PEs in which 0 == (shmem_my_pe() % 6) , there is no ordering between Threads 0 and 1 as to which one calls shmem_team_create_strided first. Thus, the internal pSync-like structure may not be allocated symmetrically across the whole of team_mod2 and team_mod3.

One example of a correct implementation would be:

#pragma omp parallel num_threads(2)
{
  shmem_team_t team_mod = SHMEM_TEAM_NULL;
  shmem_team_t team_mod = SHMEM_TEAM_NULL;

  if ((omp_get_thread_num() == 0) && (0 == (shmem_my_pe() % 2)))
    shmem_team_create_strided(SHMEM_TEAM_WORLD, 0, 2, 2, /* size */, &team_mod2);

#pragma omp barrier

  if ((omp_get_thread_num() == 1) && (0 == (shmem_my_pe() % 3)))
    shmem_team_create_strided(SHMEM_TEAM_WORLD, 0, 3, 3, /* size */, &team_mod3);

  /* ...use the teams... */
}

In this case, all the threads on all PEs such that 0 != (shmem_my_pe() % 6) will call shmem_thread_create_strided at most once, and one the PEs such that 0 == (shmem_my_pe() % 6), Thread 0 will always create team_mod2 before Thread 1 creates team_mod3.

gmegan commented 6 years ago

Added PR #41 for this issue, which also resolves Issue #33. See issue #33 for pdf attachment.

gmegan commented 5 years ago

Merged PR, closing this issue.