Closed GoogleCodeExporter closed 9 years ago
I am not sure why they should be disallowed. Is it a problem allowing them?
Since
shared [3] int A[12];
is an array[12] of shared [3] int, it seems arbitrary to disallow declaring
shared [3] int B;
That way, too, shared [3] int *p can point to an element of A or to B without a
type cast.
Original comment by brian.wibecan
on 23 May 2012 at 2:38
Would it be possible to use a layout qualifier applied on a scalar to specify
which thread should have affinity to the data (e.g. X % THREADS)? A common
issue is that thread 0 gets hammered with communication because scalars are all
placed on that thread. AFAIK, UPC currently provides no way to change this
behavior.
Original comment by james.di...@gmail.com
on 31 May 2012 at 7:17
I wouldn't want the meaning of X in shared [X] datatype foo to change so
drastically just based on whether foo is a scalar or an array. However, you
raise a good point. The only way to distribute shared scalars is to use
pointers. I could see this being an opportunity for a new allocation function,
allocating memory with affinity to exactly one (specified? random?) thread,
and distributing the pointer to all threads. Perhaps that would serve your
needs, even if more cumbersome.
Or perhaps we can devise more appropriate syntax as a proposed extension? The
scalar case is a degenerate version of an array where the first element has
affinity to some thread other than 0. Again, easily achieved using pointers.
At any rate, while there are some interesting ways this concept could go, I
don't support overloading the block size syntax to work this way.
Original comment by brian.wibecan
on 31 May 2012 at 9:15
Brian wrote in comment #3:
> I don't support overloading the block size syntax to work this way.
I agree strongly on that point.
I am also open to hearing of proposed syntax for declaring shared scalars
residing on threads other than zero, but have no proposal to offer at the
moment.
I am also open to a upc_alloc_on_thread(size, thread) proposal if there is an
actual demand for it (not just "it would be nice if...").
Earlier Brian had observed that ALLOWING the layout qualifier allows a
pointer-to-shared to reference either an element of a shared array or a shared
scalar, without casts. While I agree that is true, I don't see the immediate
benefit of this. Brian, can you elaborate on WHY this is useful?
Original comment by phhargr...@lbl.gov
on 31 May 2012 at 9:30
> Brian, can you elaborate on WHY this is useful?
The application that comes to mind most readily is a sentinel, terminal
element, or temporary element in a linked list, where the list itself consists
of elements of an array. A single scalar with the same type may serve as a
sentinel, or may be temporarily added to the list within a routine, with the
intention of removing it before routine exit. There are, of course, many ways
to accomplish similar goals, but this is one way that is convenient.
Original comment by brian.wibecan
on 1 Jun 2012 at 6:18
> I could see this being an opportunity for a new allocation function,
allocating
> memory with affinity to exactly one (specified? random?) thread, and
distributing
> the pointer to all threads. .... I am also open to a
upc_alloc_on_thread(size, thread)
> proposal if there is an actual demand for it (not just "it would be nice
if...").
If we're talking about dynamic allocation, this functionality already exists -
it's called upc_alloc(). If you are requesting the ability for one thread to
allocate space with affinity to a DIFFERENT thread, I would first question the
utility of such behavior, and then point you at upc_global_alloc or
upc_all_alloc (which wastes space but accomplishes the request). As an
implementer seeing the global performance implications of upc_global_alloc, I
would strongly oppose an allocation routine to allocate memory with affinity to
a remote thread.
If we're talking about static allocation, I can see the utility of statically
declaring scalars with non-zero affinity. I think the challenges in
introducing such a feature would be:
1) devising appropriate syntax (and NOT overloading the blocking factor syntax
which is already a sufficient source of confusion)
2) The current use of thread zero is convenient because it is already
guaranteed to exist. We would need to think about how to handle programs in the
dynamic thread environment that specify static allocation for threads which do
not exist at runtime.
Anything we come up with would need to be balanced against the fact that this
can already be easily accomplished using a statically-declared cyclic array and
a pointer, although it wastes space. eg:
shared [1] int data[THREADS];
shared [1] int *B = &(data[4]); -- a pointer to a scalar on thread 4
Original comment by danbonachea
on 15 Jun 2012 at 3:55
Dan, not that I'm proposing this function, but what I thought was being
requested was more akin to upc_all_alloc, except that it allocates memory in
the shared space of exactly one thread, rather than a distributed array. A
collective upc_alloc, in other words. Roughly:
shared void * upc_alloc_on_thread(int requested_thread, size_t size) {
static shared void * shared p;
upc_barrier;
if (MYTHREAD == requested_thread) {
p = upc_alloc(size);
}
upc_barrier;
return p;
}
I could see this possibly being useful, and more useful than mucking with the
block size syntax for scalars. Again, this is not being proposed here.
Regarding static allocation of entities with non-zero affinity, I think this is
a good question that should be broadened beyond scalars.
shared int A;
shared [] int B[12];
Both of these are allocated with affinity to thread zero. If a means is
devised for specifying A has affinity to some other thread, it should also be
able to apply that to B.
shared int C[10*THREADS];
And why not allow such a mechanism to state that C[0] has affinity to some
thread other than zero?
So, food for thought for a future proposal. I don't like the idea of singling
out scalars for special treatment here.
Original comment by brian.wibecan
on 15 Jun 2012 at 6:20
> what I thought was being requested was more akin to upc_all_alloc, except
that it
> allocates memory in the shared space of exactly one thread, rather than a
> distributed array.
The upc_alloc_on_thread code you wrote in comment #7 seems to provide that
functionality in 6 easy lines of code, so I don't see a motivation for adding
it to the standard library. Implementations might be able to make it a little
faster, but if it's ever really a bottleneck the hand-rolled function can be
easily refactored to perform many such allocations at once (or just
upc_all_alloc a distributed array and carve it up).
Regarding static shared allocation that is based somewhere other than thread 0,
that seems like an idea worth investigating for a future (2.0+) version of the
language. Let's open a new ticket for that discussion.
Returning to the original issue, nobody seems to be strongly arguing to
disallow layout qualifiers on shared scalars. The consensus seems to be they
are a minor corner-case of the syntax that look ugly when used, but may rarely
be useful and are not harmful enough to be worth prohibiting.
Closing this issue.
Original comment by danbonachea
on 16 Jun 2012 at 7:41
Original comment by gary.funck
on 6 Jul 2012 at 9:04
Original issue reported on code.google.com by
yzh...@lbl.gov
on 22 May 2012 at 11:38