Can layout qualifiers applied to shared scalars be eliminated?

GoogleCodeExporter commented 9 years ago

This is for UPC language issue 26.

Can layout qualifiers applied to shared scalars be eliminated? (not backward 
compatible, but generally not used.)

Original issue reported on code.google.com by yzh...@lbl.gov on 22 May 2012 at 11:38

GoogleCodeExporter commented 9 years ago

I am not sure why they should be disallowed.  Is it a problem allowing them?  
Since

  shared [3] int A[12];

is an array[12] of shared [3] int, it seems arbitrary to disallow declaring

  shared [3] int B;

That way, too, shared [3] int *p can point to an element of A or to B without a 
type cast.

Original comment by brian.wibecan on 23 May 2012 at 2:38

GoogleCodeExporter commented 9 years ago

Would it be possible to use a layout qualifier applied on a scalar to specify 
which thread should have affinity to the data (e.g. X % THREADS)?  A common 
issue is that thread 0 gets hammered with communication because scalars are all 
placed on that thread.  AFAIK, UPC currently provides no way to change this 
behavior.

Original comment by james.di...@gmail.com on 31 May 2012 at 7:17

GoogleCodeExporter commented 9 years ago

I wouldn't want the meaning of X in shared [X] datatype foo to change so 
drastically just based on whether foo is a scalar or an array.  However, you 
raise a good point.  The only way to distribute shared scalars is to use 
pointers.  I could see this being an opportunity for a new allocation function, 
allocating memory with affinity to exactly one (specified?  random?) thread, 
and distributing the pointer to all threads.  Perhaps that would serve your 
needs, even if more cumbersome.

Or perhaps we can devise more appropriate syntax as a proposed extension?  The 
scalar case is a degenerate version of an array where the first element has 
affinity to some thread other than 0.  Again, easily achieved using pointers.

At any rate, while there are some interesting ways this concept could go, I 
don't support overloading the block size syntax to work this way.

Original comment by brian.wibecan on 31 May 2012 at 9:15

GoogleCodeExporter commented 9 years ago

Brian wrote in comment #3:
> I don't support overloading the block size syntax to work this way.

I agree strongly on that point.
I am also open to hearing of proposed syntax for declaring shared scalars 
residing on threads other than zero, but have no proposal to offer at the 
moment.
I am also open to a upc_alloc_on_thread(size, thread) proposal if there is an 
actual demand for it (not just "it would be nice if...").

Earlier Brian had observed that ALLOWING the layout qualifier allows a 
pointer-to-shared to reference either an element of a shared array or a shared 
scalar, without casts.  While I agree that is true, I don't see the immediate 
benefit of this.  Brian, can you elaborate on WHY this is useful?

Original comment by phhargr...@lbl.gov on 31 May 2012 at 9:30

GoogleCodeExporter commented 9 years ago

> Brian, can you elaborate on WHY this is useful?

The application that comes to mind most readily is a sentinel, terminal 
element, or temporary element in a linked list, where the list itself consists 
of elements of an array.  A single scalar with the same type may serve as a 
sentinel, or may be temporarily added to the list within a routine, with the 
intention of removing it before routine exit.  There are, of course, many ways 
to accomplish similar goals, but this is one way that is convenient.

Original comment by brian.wibecan on 1 Jun 2012 at 6:18

GoogleCodeExporter commented 9 years ago

> I could see this being an opportunity for a new allocation function, 
allocating 
> memory with affinity to exactly one (specified?  random?) thread, and 
distributing 
> the pointer to all threads. .... I am also open to a 
upc_alloc_on_thread(size, thread) 
> proposal if there is an actual demand for it (not just "it would be nice 
if...").

If we're talking about dynamic allocation, this functionality already exists - 
it's called upc_alloc(). If you are requesting the ability for one thread to 
allocate space with affinity to a DIFFERENT thread, I would first question the 
utility of such behavior, and then point you at upc_global_alloc or 
upc_all_alloc (which wastes space but accomplishes the request). As an 
implementer seeing the global performance implications of upc_global_alloc, I 
would strongly oppose an allocation routine to allocate memory with affinity to 
a remote thread.

If we're talking about static allocation, I can see the utility of statically 
declaring scalars with non-zero affinity.  I think the challenges in 
introducing such a feature would be:

1) devising appropriate syntax (and NOT overloading the blocking factor syntax 
which is already a sufficient source of confusion)
2) The current use of thread zero is convenient because it is already 
guaranteed to exist. We would need to think about how to handle programs in the 
dynamic thread environment that specify static allocation for threads which do 
not exist at runtime.

Anything we come up with would need to be balanced against the fact that this 
can already be easily accomplished using a statically-declared cyclic array and 
a pointer, although it wastes space. eg:
shared [1] int data[THREADS];
shared [1] int *B = &(data[4]); -- a pointer to a scalar on thread 4

Original comment by danbonachea on 15 Jun 2012 at 3:55

GoogleCodeExporter commented 9 years ago

Dan, not that I'm proposing this function, but what I thought was being 
requested was more akin to upc_all_alloc, except that it allocates memory in 
the shared space of exactly one thread, rather than a distributed array.  A 
collective upc_alloc, in other words.  Roughly:

shared void * upc_alloc_on_thread(int requested_thread, size_t size) {
  static shared void * shared p;
  upc_barrier;
  if (MYTHREAD == requested_thread) {
    p = upc_alloc(size);
  }
  upc_barrier;
  return p;
}

I could see this possibly being useful, and more useful than mucking with the 
block size syntax for scalars.  Again, this is not being proposed here.

Regarding static allocation of entities with non-zero affinity, I think this is 
a good question that should be broadened beyond scalars.

  shared int A;
  shared [] int B[12];

Both of these are allocated with affinity to thread zero.  If a means is 
devised for specifying A has affinity to some other thread, it should also be 
able to apply that to B.

  shared int C[10*THREADS];

And why not allow such a mechanism to state that C[0] has affinity to some 
thread other than zero?

So, food for thought for a future proposal.  I don't like the idea of singling 
out scalars for special treatment here.

Original comment by brian.wibecan on 15 Jun 2012 at 6:20

GoogleCodeExporter commented 9 years ago

> what I thought was being requested was more akin to upc_all_alloc, except 
that it 
> allocates memory in the shared space of exactly one thread, rather than a 
> distributed array.

The upc_alloc_on_thread code you wrote in comment #7 seems to provide that 
functionality in 6 easy lines of code, so I don't see a motivation for adding 
it to the standard library. Implementations might be able to make it a little 
faster, but if it's ever really a bottleneck the hand-rolled function can be 
easily refactored to perform many such allocations at once (or just 
upc_all_alloc a distributed array and carve it up).

Regarding static shared allocation that is based somewhere other than thread 0, 
that seems like an idea worth investigating for a future (2.0+) version of the 
language. Let's open a new ticket for that discussion.

Returning to the original issue, nobody seems to be strongly arguing to 
disallow layout qualifiers on shared scalars. The consensus seems to be they 
are a minor corner-case of the syntax that look ugly when used, but may rarely 
be useful and are not harmful enough to be worth prohibiting.

Closing this issue.

Original comment by danbonachea on 16 Jun 2012 at 7:41

Changed state: Rejected

GoogleCodeExporter commented 9 years ago

Original comment by gary.funck on 6 Jul 2012 at 9:04

Added labels: Milestone-Spec-1.3

Intrepid / upc-specification

Can layout qualifiers applied to shared scalars be eliminated? #39