Intrepid / upc-specification

Automatically exported from code.google.com/p/upc-specification
0 stars 1 forks source link

Clarifying pointers to shared arrays, and multi-D shared arrays #3

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Evidence for pointer-to-local interpretation
--------------------------------------------

The following parts of the UPC spec and C99 spec indicate that a pointer to an
array-of-shared is NOT a pointer-to-shared:

UPC 1.2 3.2.3 - "shared array: an array with elements that have shared qualified
type"

UPC 1.2 3.4 - "pointer-to-shared: a pointer whose referenced type is
shared-qualified"

UPC 1.2 6.5.2.1.4 - "In an array declaration, the type qualifier applies to the
elements."

C99 6.7.3.8 - "If the specification of an array type includes any type
qualifiers, the element type is so qualified, not the array type."

Because the pointer's target is an array and the array itself cannot be shared 
qualified, such a pointer must be a pointer-to-local.  This conclusion is very
non-intuitive to most users.

Evidence for pointer-to-shared interpretation
---------------------------------------------

Other parts of the spec appear to (very indirectly) imply that such a pointer is
indeed a pointer-to-shared:

UPC 1.2 6.5.2.1.5 - "For any shared array, a, upc_phaseof( &a ) is zero."

UPC 1.2 7.2.3.2 - "size_t upc_phaseof( shared void *ptr );"

C99 6.5.3.2.3 - "if the operand is the result of a [] operator, neither the &
operator nor the unary * that is implied by the [] is evaluated and the result
is as if the & operator were removed and the [] operator were changed to a +
operator."

If the call to upc_phaseof( &a ) is legal, then &a must be a pointer-to-shared. 

However, &a is distinct from &a[0] and simply "a" given the quoted section of 
C99, which establishes the equivalence between &a[0] and a.  &a is a pointer to
an array.

Recommendation
--------------

The spec should be clarified.  The pointer-to-local interpretation is the most 
consistent with the rest of the spec, but users seem to expect a 
pointer-to-shared.  If it IS a pointer-to-shared, then there are other issues to
resolve with regards to pointer-to-shared arithmetic on such a pointer.

Original issue reported on code.google.com by johnson....@gmail.com on 13 Mar 2012 at 3:49

GoogleCodeExporter commented 9 years ago
Below are the proposed additions and modifications (against v1.2) to clarify 
how multi-dimensional shared arrays and pointers-to-shared whose referenced 
type is an array type work.  Once I manage to get latex working correctly on 
our development machines, I'll post the formal proposal.

[ADDED]
{3.*}
ultimate element
of a scalar is the scalar itself.  For arrays, the ultimate element is the 
ultimate element of an element of the array.

{3.*}
ultimate element type
for non-array types, the type itself.  For array types, the ultimate element 
type of an element of the array.

{3.*}
shared type
a type whose ultimate element type is shared-qualified.

{3.*}
local type
a type whose ultimate element type is not shared-qualified.

{6.4.2 *}
The affinity of a non-null pointer-to-shared whose reference type is a shared 
array type is that of its first ultimate element, as a shared array type may 
have ultimate elements on more than one thread.

{6.4.3 *}
If a non-null pointer-to-shared with definite blocksize is cast to a 
pointer-to-local whose referenced type is not the ultimate element type of the 
referenced type of the pointer-to-shared, the result is undefined.

{6.5.1.1 *}
EXAMPLE 4: shared array

    shared TYPE array[THREADS]; /* NOTE: 'upc_affinityof( &array[i] )' could be something other than 'i' if TYPE is an array type */

[MODIFIED]
{3.2.3}
shared array
an array whose ultimate element type is shared-qualified.

{3.4}
pointer-to-shared
a pointer whose referenced type is a shared type.

{3.5}
pointer-to-local
a pointer whose referenced type is a local type.

{3.6.1}
shared access
an access using an expression whose type is a shared type.

{3.6.2}
local access
an access using an expression whose type is a local type.

{6.4.1.2 1}
The upc_localsizeof operator shall only apply to shared types or expressions 
whose type is a shared type.  All constraints on the sizeof operator [ISO/IEC00 
Sec. 6.5.3.4] also apply to this operator.

{6.4.1.3 1}
The upc_blocksizeof operator shall only apply to shared types or expressions 
whose type is a shared type.  All constraints on the sizeof operator [ISO/IEC00 
Sec. 6.5.3.4] also apply to this operator.

{6.4.1.4 1}
The upc_elemsizeof operator shall only apply to shared types or expressions 
whose type is a shared type.  All constraints on the sizeof operator [ISO/IEC00 
Sec. 6.5.3.4] also apply to this operator.

{6.4.1.4 2}
The upc_elemsizeof operator returns the size, in bytes, of the ultimate element 
type of a shared type or expression whose type is a shared type.  The result of 
upc_elemsizeof is an integer constant.

{6.4.2 2}
.... If the shared array has a definite block size, then the following example 
describes pointer arithmetic:

    shared [B] TYPE (*p), (*p1); /* B a positive integer, TYPE is a local type */
    int i;

    p1 = p + i;

{6.4.2 3}
...:

    size_t elem_delta = i * (sizeof(TYPE) / upc_elemsizeof(TYPE))

    upc_phaseof(p1) == (upc_phaseof(p) + elem_delta) mod B
    upc_threadof(p1) == (upc_threadof(p) + (upc_phaseof(p) + elem_delta) div B) mod THREADS

{6.4.2 4}
...:

    LT *p1, *p2;
    shared ST (*s1), (*s2); /* LT is the non-shared-qualified version of the ultimate element type of ST */

    p1 = (LT*) s1; /* Allowed if s1 has affinity to MYTHREAD */
    p2 = (LT*) s2; /* Allowed if s2 has affinity to MYTHREAD */

{6.4.2 5}
...:
    ...
    * The expression (((ptrdiff_t) upc_addrfield(s2)) - ((ptrdiff_t) upc_addrfield(s1))) shall evaluate to the same value as ((p2 - p1) * sizeof(LT)).

{6.5.1.1 10}
....  This factor is the nonnegative number of consecutive ultimate elements 
(when evaluating pointer-to-shared arithmetic and array declarations) which 
have affinity to the same thread....

{6.5.2 8}
...and no array object with automatic storage duration shall have an ultimate 
element type that is shared-qualified.

{6.5.2.1 3}
Ultimate elements of shared arrays are distributed in a round-robin fashion, by 
chunks of block-size ultimate elements, such that the i-th ultimate element has 
affinity with thread (floor(i/block_size) mod THREADS).

{6.5.2.1 4}
In an array declaration, the type qualifier applies to the ultimate elements.

{6.5.2.1 9}
...:
   ...

shared [3] applies to the ultimate element type of T, which is int, regardless 
of the typedef. ...

   ...

{7.2.3.5 4}
In the case of a statically allocated shared object with declaration:

    shared [b] t d[s];

the totalsize argument shall be s * sizeof(t) and the nbytes argument shall be 
b * upc_elemsizeof(t). ...

Original comment by sdvor...@cray.com on 5 Feb 2013 at 5:58

GoogleCodeExporter commented 9 years ago
Thanks Steve - I think we're headed in the right direction here. I know I'll 
have more comments on the proposed changes once I see the full diff against the 
current draft and have some more time to study it, but here are some initial 
thoughts..

>ultimate element type
>ultimate element 

In the interests of simplicity, can we get away with just the term "ultimate 
element type", and eliminate the second term (the instantiated version)? The 
former is a very useful concept, but the latter seems only tangentially useful 
and the definition is somewhat ambiguous:

> ultimate element of a scalar is the scalar itself.  For arrays, the ultimate 
element is the ultimate element of an element of the array.

This wording is confusing because "the ultimate element" of an array is not a 
unique entity - ie it should probably read "an ultimate element" not "the 
ultimate element", but even then the definition seems shaky. Also, the term 
"scalar" is not what we want here, because it excludes struct and union types 
(probably wanted "non-array object" instead). Overall seems better to ditch 
this term entirely and always specify things in terms of "object of ultimate 
element type".

> shared type
> local type

Similarly, since these two are mutually exclusive sets, it seems sufficient to 
only define "shared type", and use the phrase "not a shared type" instead of 
"local type". This wording is also nice because it seems self-evident that any 
vanilla C99 type is "not a shared type", but a novice might wonder if their C99 
type constitutes a "local type".

> The affinity of a non-null pointer-to-shared whose reference type is a shared 
array type is that of its first ultimate element, as a shared array type may 
have ultimate elements on more than one thread.

This paragraph conflates types and instantiated objects (eg a "shared array 
type" does not have any instantiated "ultimate elements", "on" more than one 
thread or anywhere else). Also, the phrase "affinity of a pointer" generally 
means the affinity of the pointer object itself, not to be confused with the 
affinity of the referenced object which is what we mean here.
I'm not sure I fully grasp what this statement is trying to say, but perhaps we 
could use a more direct statement? Eg:
   "A pointer to a shared array is a pointer-to-shared that points to the initial element of the array object."

> If a non-null pointer-to-shared with definite blocksize is cast to a 
pointer-to-local whose referenced type is not the ultimate element type of the 
referenced type of the pointer-to-shared, the result is undefined.

This is way too strong as currently written. 
At least need to insert: "is not compatible with a qualified or unqualified 
version of the ultimate element type" 
Also, casts to (void *) need to be permitted.
It might be easier/safer to rephrase this as a prohibition against casting to 
pointer-to-local-array, since I think that's what you're trying to prohibit 
here?

> EXAMPLE 4: shared array
>    shared TYPE array[THREADS]; /* NOTE: 'upc_affinityof( &array[i] )' could 
be something other than 'i' if TYPE is an array type */

This example is trying to illustrate a property of multi-dimensional shared 
arrays, so it might be clearer to directly show one?

> The upc_*sizeof operator shall only apply to shared types or expressions 
whose type is a shared type.

Suggested rewording: "expressions with shared type." (4 occurrences)

>     size_t elem_delta = i * (sizeof(TYPE) / upc_elemsizeof(TYPE))

This part of the specification is ill-defined because TYPE is not a "shared 
type", and therefore an invalid argument to upc_elemsizeof(). Suggest replacing 
it with: upc_elemsizeof(array)

> {6.5.2 8}...and no array object with automatic storage duration shall have an 
ultimate element type that is shared-qualified.

Why not simply replace this whole paragraph with something like:
"No object with automatic storage duration shall have shared type."

> {7.2.3.5 4}
> In the case of a statically allocated shared object with declaration:
>    shared [b] t d[s];
> the totalsize argument shall be s * sizeof(t) and the nbytes argument shall 
be b * upc_elemsizeof(t). ...

Another invalid use of upc_elemsizeof(). Should probably be: "b * 
upc_elemsizeof(d)"

> {6.5.2.1 4}In an array declaration, the type qualifier applies to the 
ultimate elements.

Both the old and new text are a bit too "loose" here. Qualifiers are applied to 
types, not objects (or elements). It also notably doesn't cover typedefs or 
types appearing in casts.
Further, this is the paragraph discussed in earlier comments, which I proposed 
to rewrite to something like:

"As specified in [C99 6.7.3.8], if the specification of an array type includes 
any type qualifiers (possibly via one or more typedefs), the element type is so 
qualified, not the array type. Additionally, a pointer to an array of shared 
elements is a pointer-to-shared.\footnote{This also applies to 
multi-dimensional arrays, so a pointer to an array of array of shared elements 
is a pointer-to-shared.}" 

Related to this, the proposal doesn't seem to currently address the central 
question in issue 3 - ie what do you get when you apply the address-of operator 
to a shared array? This was discussed at length in earlier comments, but 
doesn't seem to be covered by any of the proposed changes, and seems important 
enough to be directly specified. I thought we had agreed to specify the 
pointer-to-shared interpretation, with something like comment #42.

Here's some concrete examples for discussion:

1:  shared int x[5];
2:  shared int y[5][5];
3:  shared int *p1;
4:  shared int (*p2)[5];
...
5:  p1 = x;     // should be legal..
6:  p1 = &x[0]; // equivalent to this, also legal
7:  p1 = &y[0][0]; // also also legal
8:  upc_threadof(&x[0]); // must be permitted
9:  upc_threadof(&y[0][0]); // must be permitted
10: upc_threadof(p1);    // must be permitted
11: upc_threadof(x);     // silly but should probably also be permitted
12: p2 = &x;             // should be legal?
13: upc_threadof(&x);    // legal for PTS interpretation, constraint violation 
if the expression type is pointer-to-local
14: p2 = y;     // another problematic case
15: upc_threadof(p2); // more questionable cases...
16: upc_threadof(*p2);
17: upc_threadof(&y[2]);

We could potentially eliminate half of the problem here by prohibiting the 
address-of operator on expressions of shared array type. However, we still need 
to clarify whether p2 is a pointer-to-shared or a pointer-to-local, and which 
of the lines above are legal or prohibited. 

Crazy Idea: A stronger and simpler approach would be to prohibit the type 
pointer-to-shared-array entirely, since objects of that type have very limited 
utility anyhow. This approach would force users to always use a pointer to 
ultimate element type (in which case p2 and lines 12-17 would all be rejected 
by the typechecker). I'm guessing this idea is controversial, and I'm not sure 
what the practical impact would be on users. Do we have important use cases 
that motivate continuing to allow this problematic type?

Some relevant text from C99:
C99 6.3.2.1 - Implicit conversion for array operands
"Except when it is the operand of the sizeof operator, or the unary & operator, 
or is a
string literal used to initialize an array, an expression that has type 
‘‘array of type’’ is
converted to an expression with type ‘‘pointer to type’’ that points to 
the initial element of
the array object and is not an lvalue."

C99 6.5.3.2-3 - Semantics of &
"The unary & operator yields the address of its operand. If the operand has 
type ‘‘type’’,
the result has type ‘‘pointer to type’’. If the operand is the result 
of a unary * operator,
neither that operator nor the & operator is evaluated and the result is as if 
both were
omitted, except that the constraints on the operators still apply and the 
result is not an
lvalue. Similarly, if the operand is the result of a [] operator, neither the & 
operator nor
the unary * that is implied by the [] is evaluated and the result is as if the 
& operator
were removed and the [] operator were changed to a + operator. Otherwise, the 
result is
a pointer to the object or function designated by its operand."

A very helpful page discussing the nitty-gritty details of C arrays and 
pointers:
http://c-faq.com/~scs/cgi-bin/faqcat.cgi?sec=aryptr

A C99-compliant example showing some of the relevant things C99 permits:
int x[5];
int y[5][5];
int main() {
  int *p1 = x;
  int (*p2)[5];
  p1 = &x[0];
  p2 = &x;
  (*p2)[2] = 7;
  p2 = y;
  p2 = &y[1];
  return 0;
}

Original comment by danbonachea on 11 Feb 2013 at 9:45

GoogleCodeExporter commented 9 years ago
"Related to this, the proposal doesn't seem to currently address the central 
question in issue 3 - ie what do you get when you apply the address-of operator 
to a shared array? This was discussed at length in earlier comments, but 
doesn't seem to be covered by any of the proposed changes, and seems important 
enough to be directly specified. I thought we had agreed to specify the 
pointer-to-shared interpretation, with something like comment #42."

This naturally falls out of the new definition of pointer-to-shared.  Since a 
shared array is a shared-type (its ultimate element type is shared-qualified), 
any pointer to it (due to taking its address for instance...) is a 
pointer-to-shared.

Original comment by sdvor...@cray.com on 11 Feb 2013 at 6:47

GoogleCodeExporter commented 9 years ago
"In the interests of simplicity, can we get away with just the term "ultimate 
element type", and eliminate the second term (the instantiated version)? The 
former is a very useful concept, but the latter seems only tangentially useful 
and the definition is somewhat ambiguous:"

Sure.  I just added it to save some typing on my part.

"This wording is confusing because "the ultimate element" of an array is not a 
unique entity - ie it should probably read "an ultimate element" not "the 
ultimate element", but even then the definition seems shaky. Also, the term 
"scalar" is not what we want here, because it excludes struct and union types 
(probably wanted "non-array object" instead). Overall seems better to ditch 
this term entirely and always specify things in terms of "object of ultimate 
element type"."

Sounds good to me.

"This is way too strong as currently written. 
At least need to insert: "is not compatible with a qualified or unqualified 
version of the ultimate element type" 
Also, casts to (void *) need to be permitted."

Yes, I forgot to permit casting to compatible local types--that should 
definitely be allowed.

Original comment by sdvor...@cray.com on 11 Feb 2013 at 6:53

GoogleCodeExporter commented 9 years ago
> This naturally falls out of the new definition of pointer-to-shared.  Since a 
shared 
> array is a shared-type (its ultimate element type is shared-qualified), any 
pointer 
> to it (due to taking its address for instance...) is a pointer-to-shared.

I agree the framework is compatible with this interpretation, but it still 
seems worth explicit clarification. Also, simply saying it is a shared type 
doesn't really specify the semantics of an example like:

shared [5] int a[1000][THREADS];
upc_blocksizeof(&a);

Is the blocksize of the ultimate element type propagated to pointers to the 
array? If so where does it say that?

Original comment by danbonachea on 11 Feb 2013 at 6:58

GoogleCodeExporter commented 9 years ago
"Is the blocksize of the ultimate element type propagated to pointers to the 
array? If so where does it say that?"

Ah, I forgot to add a sentence to upc_blocksizeof() stating that it returned 
the blocksize of the the ultimate element type, similar to the language for 
upc_elemsizeof().

Original comment by sdvor...@cray.com on 11 Feb 2013 at 7:03

GoogleCodeExporter commented 9 years ago
Ignore my example from comment 55 - as written it's a constraint violation of 
upc_blocksizeof()

Original comment by danbonachea on 11 Feb 2013 at 7:42

GoogleCodeExporter commented 9 years ago
Issue 85 has been merged into this issue.

Original comment by sdvor...@cray.com on 20 Feb 2013 at 12:00

GoogleCodeExporter commented 9 years ago

Original comment by sdvor...@cray.com on 20 Feb 2013 at 12:00

GoogleCodeExporter commented 9 years ago
Attached is a first pass at the proposed changes in the spec.  The changes 
mostly follow those in comment 51, with some corrections from comments 52-56.

Original comment by sdvor...@cray.com on 20 Feb 2013 at 12:07

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by sdvor...@cray.com on 20 Feb 2013 at 12:07

GoogleCodeExporter commented 9 years ago
> Attached is a first pass at the proposed changes in the spec.  

Could you please attach a Latex diff?

Original comment by danbonachea on 20 Feb 2013 at 4:34

GoogleCodeExporter commented 9 years ago
> Could you please attach a Latex diff?

I made all the changes with the \xadded and \xreplaced macros, so they should 
be explicitly spelled out in the PDF, along with the old text where applicable. 
 Since there's a lot of small changes all over the place, it's probably easier 
to review by checking the change bars than to go over the latex diff.  Still, 
I'll make diffs and attach them tomorrow.

Original comment by sdvor...@cray.com on 20 Feb 2013 at 6:01

GoogleCodeExporter commented 9 years ago
Attached are svn diffs against the current trunk (r205) for the three changed 
files (upc-language.tex, upc-lib-core.tex, upc-terms-and-defs.tex) used to 
produce the PDF in comment 60.

Original comment by sdvor...@cray.com on 20 Feb 2013 at 3:34

Attachments:

GoogleCodeExporter commented 9 years ago
As promised at our last telcom, attached is a test-case Cray used for testing 
while updating our compiler for the changes proposed here.  It tests various 
ways of indexing a multidimensional shared array.  Thread 0 prints the array 
out for each method of indexing as the checking occurs.

As an example of incorrect output, consider the output when compiled with CCE 
8.1 and run with 2 threads.

array2d:
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
    7@1:2    8@1:3    9@1:4   10@0:0   11@0:1   12@0:2   13@0:3
   14@0:4   15@1:0   16@1:1   17@1:2   18@1:3   19@1:4   20@0:0
   21@0:1   22@0:2   23@0:3   24@0:4   25@1:0   26@1:1   27@1:2
   28@1:3   29@1:4   30@0:0   31@0:1   32@0:2   33@0:3   34@0:4
   35@1:0   36@1:1   37@1:2   38@1:3   39@1:4   40@0:0   41@0:1

array2d (derefi):
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1

array2d (derefj):
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
    7@1:2    8@1:3    9@1:4   10@0:0   11@0:1   12@0:2   13@0:3
   14@0:4   15@1:0   16@1:1   17@1:2   18@1:3   19@1:4   20@0:0
   21@0:1   22@0:2   23@0:3   24@0:4   25@1:0   26@1:1   27@1:2
   28@1:3   29@1:4   30@0:0   31@0:1   32@0:2   33@0:3   34@0:4
   35@1:0   36@1:1   37@1:2   38@1:3   39@1:4   40@0:0   41@0:1

array2d (derefij):
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1

array2dptr:
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
   12@0:0   13@0:1   14@0:2   20@0:3   21@0:4   17@1:0   18@1:1
   24@0:0   30@0:1   31@0:2   32@0:3   33@0:4   29@1:0   35@1:1
   41@0:0    0@0:1    0@0:2    0@0:3    0@0:4    0@1:0    0@1:1
    0@0:0    0@0:1    0@0:2    0@0:3    0@0:4    0@1:0    0@1:1
    0@0:0    0@0:1    0@0:2    0@0:3    0@0:4    0@1:0    0@1:1

Thread 0 failed 3 tests!
Thread 1 failed 3 tests!

For each indexing method, thread 0 prints each element in the form 
{value}@{thread}:{phase}.  If things work correctly, each method should 
generate the same output.  That is clearly not the case here.  With the latest 
development build of CCE however, we do get the expected results.

array2d:
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
    7@1:2    8@1:3    9@1:4   10@0:0   11@0:1   12@0:2   13@0:3
   14@0:4   15@1:0   16@1:1   17@1:2   18@1:3   19@1:4   20@0:0
   21@0:1   22@0:2   23@0:3   24@0:4   25@1:0   26@1:1   27@1:2
   28@1:3   29@1:4   30@0:0   31@0:1   32@0:2   33@0:3   34@0:4
   35@1:0   36@1:1   37@1:2   38@1:3   39@1:4   40@0:0   41@0:1

array2d (derefi):
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
    7@1:2    8@1:3    9@1:4   10@0:0   11@0:1   12@0:2   13@0:3
   14@0:4   15@1:0   16@1:1   17@1:2   18@1:3   19@1:4   20@0:0
   21@0:1   22@0:2   23@0:3   24@0:4   25@1:0   26@1:1   27@1:2
   28@1:3   29@1:4   30@0:0   31@0:1   32@0:2   33@0:3   34@0:4
   35@1:0   36@1:1   37@1:2   38@1:3   39@1:4   40@0:0   41@0:1

array2d (derefj):
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
    7@1:2    8@1:3    9@1:4   10@0:0   11@0:1   12@0:2   13@0:3
   14@0:4   15@1:0   16@1:1   17@1:2   18@1:3   19@1:4   20@0:0
   21@0:1   22@0:2   23@0:3   24@0:4   25@1:0   26@1:1   27@1:2
   28@1:3   29@1:4   30@0:0   31@0:1   32@0:2   33@0:3   34@0:4
   35@1:0   36@1:1   37@1:2   38@1:3   39@1:4   40@0:0   41@0:1

array2d (derefij):
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
    7@1:2    8@1:3    9@1:4   10@0:0   11@0:1   12@0:2   13@0:3
   14@0:4   15@1:0   16@1:1   17@1:2   18@1:3   19@1:4   20@0:0
   21@0:1   22@0:2   23@0:3   24@0:4   25@1:0   26@1:1   27@1:2
   28@1:3   29@1:4   30@0:0   31@0:1   32@0:2   33@0:3   34@0:4
   35@1:0   36@1:1   37@1:2   38@1:3   39@1:4   40@0:0   41@0:1

array2dptr:
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
    7@1:2    8@1:3    9@1:4   10@0:0   11@0:1   12@0:2   13@0:3
   14@0:4   15@1:0   16@1:1   17@1:2   18@1:3   19@1:4   20@0:0
   21@0:1   22@0:2   23@0:3   24@0:4   25@1:0   26@1:1   27@1:2
   28@1:3   29@1:4   30@0:0   31@0:1   32@0:2   33@0:3   34@0:4
   35@1:0   36@1:1   37@1:2   38@1:3   39@1:4   40@0:0   41@0:1

Original comment by sdvor...@cray.com on 21 Feb 2013 at 6:05

Attachments:

GoogleCodeExporter commented 9 years ago
Steve,

Thanks for the test case.  I made the following small mod. to highlight the 
cells where the actual value is not equal to the expected value.

$ diff -u t_2darr.c cray-2d-array-test.c 
--- t_2darr.c   2013-02-21 13:48:05.156551400 -0800
+++ cray-2d-array-test.c        2013-02-21 13:57:46.630987352 -0800
@@ -60,7 +60,8 @@
     shared [BLKS] VTYPE *ptr = &(expr); \
     VTYPE actual = *ptr; \
     if ( actual != EXPECTED(i,j) ) failed = 1; \
-    t0_print( "  " VFMT "@" TFMT ":" PFMT, \
+    t0_print( "  %s" VFMT "@" TFMT ":" PFMT, \
+              actual != EXPECTED(i,j) ? "*" : " ", \
               actual, \
               (TTYPE)upc_threadof( ptr ), \
               (PTYPE)upc_phaseof( ptr ) )

Currently, GUPC doesn't pass this test.  Here is the example output.

array2d:
     0@0:0     1@0:1     2@0:2     3@0:3     4@0:4     5@1:0     6@1:1
     7@1:0     8@1:1     9@1:2    10@1:3    11@1:4    12@0:0    13@0:1
    14@0:0    15@0:1  * 21@0:2  * 22@0:3  * 23@0:4    19@1:0    20@1:1
    21@0:0    22@0:1    23@0:2    24@0:3    25@0:4    26@1:0    27@1:1
    28@1:0    29@1:1  * 35@1:2  * 36@1:3  * 37@1:4    33@0:0    34@0:1
    35@1:0    36@1:1    37@1:2    38@1:3    39@1:4    40@0:0    41@0:1

Original comment by gary.funck on 21 Feb 2013 at 10:00

GoogleCodeExporter commented 9 years ago
> Currently, GUPC doesn't pass this test.  Here is the example output.

Attached is an updated version of the test that puts asterisks by invalid 
elements, and additionally checks that the addrfield, phase, and thread are 
valid as well.  I see in GUPC's output, the thread and phase (and likely 
addrfield) are not correct, even where it gets the correct value.

Original comment by sdvor...@cray.com on 21 Feb 2013 at 10:45

Attachments:

GoogleCodeExporter commented 9 years ago
Steve,

Thanks for the improved checking.

Unfortunately, the new code runs into GUPC's current errant shared address 
calculation.

array2d:
t_2darr: at t_2darr_v2.c:93, UPC error: Invalid conversion of shared address to 
local pointer;
thread does not have affinity to shared address
thread 1 terminated with signal: 'Aborted'
Terminated

The offending code is here:

    91      shared [BLKS] VTYPE *p_first = &array2d[p_first_i][p_first_j];
    92
    93      size_t local_diff = (((VTYPE *)ptr) - ((VTYPE *)p_first)) *
    94                          sizeof( VTYPE );

Perhaps something like the following is good enough?

size_t local_diff = (upc_threadof(ptr) == MYTHREAE && upc_threadof(pfirst) == 
MYTHREAD)
            ? (((VTYPE *)ptr) - ((VTYPE *)p_first))+ sizeof( VTYPE )
            : (ROWS * COLS) ;

The idea above is to set local_diff to something outside of the normal 
computation.
Alternatively, local_diff might be made an 'int' and the calculation above 
could then return -1.

Original comment by gary.funck on 21 Feb 2013 at 11:19

GoogleCodeExporter commented 9 years ago
I was just going to change the if checks to only compute and check local_diff 
if it is on the same thread.  Thus:

    if ( addr_diff != exp_diff ) {
        failed = 1;
    }
    else if ( upc_threadof( ptr ) == MYTHREAD ) {
        size_t local_diff = (((VTYPE *)ptr) - ((VTYPE *)p_first)) *
                            sizeof( VTYPE );
        if ( local_diff != exp_diff ) {
            failed = 1;
        }
    }

Original comment by sdvor...@cray.com on 21 Feb 2013 at 11:24

GoogleCodeExporter commented 9 years ago
Removed illegal pointer-to-local cast from check_addrfield() using change from 
comment 69.

Original comment by sdvor...@cray.com on 21 Feb 2013 at 11:31

Attachments:

GoogleCodeExporter commented 9 years ago
OK, the fix in Comment 70 avoids the runtime check, thanks.

$ t_2darr -n2
array2d:
    0@0:0    1@0:1    2@0:2    3@0:3    4@0:4    5@1:0    6@1:1
**  7@1:0**  8@1:1**  9@1:2** 10@1:3** 11@1:4** 12@0:0** 13@0:1
** 14@0:0** 15@0:1** 21@0:2** 22@0:3** 23@0:4** 19@1:0** 20@1:1
** 21@0:0** 22@0:1** 23@0:2** 24@0:3** 25@0:4** 26@1:0** 27@1:1
** 28@1:0** 29@1:1** 35@1:2** 36@1:3** 37@1:4** 33@0:0** 34@0:1
   35@1:0   36@1:1   37@1:2   38@1:3   39@1:4   40@0:0   41@0:1
[...]

Original comment by gary.funck on 21 Feb 2013 at 11:56

GoogleCodeExporter commented 9 years ago
In order to make this test fit better with the conventions used within the 
Berkeley test harness, I'd suggest something along the lines of the following.

diff -u t_2darr_v{3,4}.c
--- t_2darr_v3.c        2013-02-21 15:52:19.900938400 -0800
+++ t_2darr_v4.c        2013-02-21 16:15:58.442793237 -0800
@@ -233,6 +233,7 @@
 main()
 {
     int failed = 0;
+    static shared int fail_count[THREADS];

     if ( THREADS > MAX_THREADS ) {
         t0_print( "Please run with %d or fewer threads.\n",
@@ -247,9 +248,21 @@
     failed += print_array2d_derefj();
     failed += print_array2d_derefij();
     failed += print_array2dptr();
+    fail_count[MYTHREAD] = failed;

-    if ( failed != 0 ) {
-        printf("Thread %d failed %d tests!\n",MYTHREAD,failed);
+    upc_barrier;
+
+    if (MYTHREAD == 0) {
+      int passed = 1;
+      for (int t = 0; t < THREADS; ++t) {
+        if (fail_count[t] > 0) {
+          passed = 0;
+          printf("Thread %d failed %d tests!\n",t,fail_count[t]);
+        }
+      }
+      printf("2D array test %s\n",
+             passed ? "passed" : "failed");
+      if (!passed) upc_global_exit(1);
     }

     return 0;

This changes causes a single pass/fail line to be printed if any of the threads 
failed and issues a non-zero exit if a failure is detected.  (I realize that 
exit codes have been deemed unreliable.  That is one reason that many of the 
harness tests issue an affirmative diagnostic.)

Original comment by gary.funck on 22 Feb 2013 at 12:21

GoogleCodeExporter commented 9 years ago
Steve - Can you verify the 17-line program in comment #52 should be fully 
permitted by the rules we're specifying? Does the updated Cray compiler accept 
that program?

Here are my comments on the latest proposal:

+ For array types, recursively, the
+ ultimate element type of the element type of the array type.

Possible rewording:
"For an array type ``array of T'', the ultimate element type of T."

 \ssterm{shared array}%
- an array with elements that have shared qualified type.
+ an array \xreplaced[id=DB]{3}{whose ultimate element type is
+ shared-qualified}{with elements that have shared qualified type}.

Given the new definitions above, wouldn't it be equivalent and cleaner to 
replace this entire definition with "an array with shared type"?

 \np The {\tt upc\_elemsizeof} operator returns the size, in bytes,
- of the highest-level (leftmost) type that is not an array. For
+ of the \xreplaced[id=DB]{3}{ultimate element type of an expression
+ with shared type}{highest-level (leftmost)
+ type that is not an array. For
non-array objects, {\tt upc\_elemsizeof} returns the same value as
- sizeof. The result of {\tt upc\_elemsizeof} is an integer constant.
+ sizeof}. The result of {\tt upc\_elemsizeof} is an integer constant.

The proposed semantic description is currently biased toward a 
/unary-expression/ operand, and leaves the /type-name/ operand case somewhat 
underspecified. It also removes the (redundant but handy) clarification that 
despite its misleading name, this operator also accepts non-array operands.
How about this replacement text:

  The {\tt upc\_elemsizeof} operator returns the size, in bytes, of the ultimate element type of its operand.
  For non-array operands, {\tt upc\_elemsizeof} returns the same value as sizeof.

 \npf A shared type qualifier shall not appear in a type cast
where the corresponding pointer component of the type of the expression
- being cast is not shared-qualified.\footnote{i.e., pointers-to-local
+ being cast is not \xreplaced[id=DB]{3}{a shared type}{shared-qualified}.
+ \footnote{i.e., pointers-to-local
cannot be cast to pointers-to-shared.} 

Might also want to remove "qualifier" from the first sentence. This has the 
beneficial effect of clarifying the restriction still applies when the shared 
qualifier is introduced via typedefs.

+\xadded[id=DB]{3}{
+\np A non-null pointer-to-shared whose reference type is an array type points
+ to the first object of the ultimate element type of the referenced array
+ type in the pointed-to shared array object.
+}

I believe this description is correct, but it is difficult to parse and doesn't 
really convey the restriction we're trying to impose. It also seems redundant 
with this later paragraph:

+\xadded[id=DB]{3}{
+\np When the unary {\tt \&} is applied to an expression with a shared
+ array type, the result is a pointer-to-shared that points to the
+ beginning of the pointed-to shared array, whose referenced type
+ matches that of the expression the unary {\tt \&} was applied to.
+}
+

We also have a very closely-related existing paragraph in 6.5.2.1: "For any 
shared array, a, upc_phaseof (&a) is zero."

Could we perhaps merge all three into a single, clear statement? Something like:

  \np When the unary {\tt \&} operator is applied to an expression with shared
      array type, the result is a pointer-to-shared with zero phase that points to the
      address of the first byte of the initial array element. The referenced type of the 
      pointer matches the operand type.

I think that captures everything we are trying to specify? Ie:
  - &a is a pointer-to-shared
  - upc_phaseof(&a) is zero
  - upc_threadof(&a) is equal to upc_threadof(&a[0]) (and also upc_threadof(a))
  - &a actually points to the beginning of the initial *element* (and not eg to an array descriptor, which don't exist in UPC). This specifically implies you can cast &a directly to an element type and use it to access an element
The last sentence is completely redundant with C99 6.5.3.2-3 and could 
optionally be removed, but might be worth keeping for clarity.

- T *P1, *P2;
- shared T *S1, *S2;
+ LT *P1, *P2;
+ shared ST *S1, *S2; /* LT is the ultimate element type of ST */

- P1 = (T*) S1; /* allowed if S1 has affinity to MYTHREAD */
- P2 = (T*) S2; /* allowed if S2 has affinity to MYTHREAD */
+ P1 = (LT*) S1; /* allowed if S1 has affinity to MYTHREAD */
+ P2 = (LT*) S2; /* allowed if S2 has affinity to MYTHREAD */
...
\item S1 and P1 shall point to the same object.
\item S2 and P2 shall point to the same object. 
 \item The expression (({\tt (ptrdiff\_t) upc\_addrfield} (S2) - {\tt (ptrdiff\_t) upc\_addrfield(S1))} shall
- evaluate to the same value as ((P2 - P1) * sizeof(T)).
+ evaluate to the same value as ((P2 - P1) * 
sizeof(\xreplaced[id=DB]{3}{LT}{T})). 

The replacement text is a bit problematic when ST and LT are permitted to 
differ (eg consider ST=(int[10]) and LT=(int))
In that case the expression: (LT*) S1  is at least "fishy", since it casts 
(shared int (*)[10]) directly to (int *), which technically has an incompatible 
referenced type. I believe this is still permitted by C99 (although the 
alignment restrictions in 6.3.2.3 might technically be problematic?), but 
utilizing that corner case for one of the axioms that defines our pointer 
arithmetic seems a bit "sketchy". More importantly, *P1 and *S1 no longer have 
the same unqualified type, so it is incorrect to say that "S1 and P1 shall 
point to the same object", as one might actually point to a sub-object of the 
other.

Is it necessary to change the variable definitions at all? Couldn't we just 
leave them unchanged and simply change the final expression to:
((P2 - P1) * upc_elemsizeof(*S1))

+\np \xreplaced[id=DB]{3}
+ {Objects of the ultimate element type of the array are distributed
+ in a round robin fashion, by chunks of block-size objects, such that
+ the i-th object}

This is the first paragraph in a new section, and refers to "the array", a 
seemingly prior entity which is not defined. Further, the proposed change 
removes the qualification that this paragraph applies to (and only to) shared 
arrays. Perhaps add an introductory sentence: "The elements of a shared array 
are laid out as follows."

Also while we're changing that paragraph, I suggest we change the equation:
   (floor (i/block_size) mod THREADS)
to use the mathematical \lfloor\rfloor operator defined by C99 3.19, ie:
   ($\lfloor i/block\_size \rfloor$ {\tt mod THREADS})
The latter is a bit more precise since the floor() function(s) in math.h return 
a floating-point value, which is not relevant here.

Editorial stuff:
----------------
> \xreplaced[id=DB]
We're not currently displaying change authors, but please use your own initials 
in the annotation id ;-)
You might need to add your name to common/upc-spec-preamble.tex.

- {\tt nbytes} argument shall be {\tt b * sizeof (t)}. If the block
- size is indefinite, {\tt nbytes} shall be 0.
+ {\tt nbytes} argument shall be {\tt b * \xreplaced[id=DB]{3}
+ {upc\_elemsizeof (d)}{sizeof (t)}. If the block size is indefinite,
+ {\tt nbytes} shall be 0.

Missing a closing } for the \tt that breaks the subsequent text font.

+    number of consecutive \xreplaced[id=DB]{3}{objects of the ulitmate

Spelling error.

- THREADS} environment and an array with shared-qualified elements
+ THREADS} environment and \xreplaced[id=DB]{3}{a shared array}{an array
+ with shared-qualified elements}

This change touches wording already being changed by PendingApproval issue 
94/95, and will need to be merged "carefully".

Original comment by danbonachea on 25 Feb 2013 at 5:23

GoogleCodeExporter commented 9 years ago
> Steve - Can you verify the 17-line program in comment #52 should be fully 
permitted by the rules we're specifying? Does the updated Cray compiler accept 
that program?

It should be.  I'll verify that our compiler accepts it tomorrow.

> Possible rewording:
> "For an array type ``array of T'', the ultimate element type of T."

That's much better!  I'll change it tomorrow.

> Given the new definitions above, wouldn't it be equivalent and cleaner to 
replace this entire definition with "an array with shared type"?

Probably.

> How about this replacement text:
>  The {\tt upc\_elemsizeof} operator returns the size, in bytes, of the 
ultimate element type of its operand.
>  For non-array operands, {\tt upc\_elemsizeof} returns the same value as 
sizeof.

I assume that should be {\tt sizeof} at the end, but yes, that would work.

> ...
> Could we perhaps merge all three into a single, clear statement? Something 
like:
>   
>  \np When the unary {\tt \&} operator is applied to an expression with shared
>      array type, the result is a pointer-to-shared with zero phase that 
points to the
>      address of the first byte of the initial array element. The referenced 
type of the 
>      pointer matches the operand type.

The result is NOT guaranteed to have zero phase.  6.5.2.1 applies to taking the 
address of a variable whose type is a shared array, not expressions with whose 
type is a shared array type in general.  If we always force the resulting 
pointer-to-shared to have zero phase, then pointer arithmetic (and thus array 
indexing!) does not work for multi-dimensional shared arrays.  See the 
(updated) test program in comment 70 for an example of why this is necessary.

> The replacement text is a bit problematic when ST and LT are permitted to 
differ (eg consider ST=(int[10]) and LT=(int))
> In that case the expression: (LT*) S1  is at least "fishy", since it casts 
(shared int (*)[10]) directly to (int *), which technically has an incompatible 
referenced type. I believe this is still permitted by C99 (although the 
alignment restrictions in 6.3.2.3 might technically be problematic?), but 
utilizing that corner case for one of the axioms that defines our pointer 
arithmetic seems a bit "sketchy". More importantly, *P1 and *S1 no longer have 
the same unqualified type, so it is incorrect to say that "S1 and P1 shall 
point to the same object", as one might actually point to a sub-object of the 
other.

No, casting (shared int (*)[10]) to (int (*)[10]) is exactly what we MUST 
forbid!  Because of the block cyclic distribution, there may not be 10 ints on 
the local thread to point at!  The ONLY thing that you can safely locally cast 
to is a pointer to the ultimate element type, and therefore ST and LT MUST 
differ if ST is an array type.  Keep in mind that *S1 could span multiple 
threads, while *P1 is limited to the (initial) portion of *S1 that is local to 
the casting thread.

> Is it necessary to change the variable definitions at all? Couldn't we just 
leave them unchanged and simply change the final expression to:
> ((P2 - P1) * upc_elemsizeof(*S1))

I think your replacement equation works, but, as explained above, we do need to 
change the variable definitions.

> This is the first paragraph in a new section, and refers to "the array", a 
seemingly prior entity which is not defined. Further, the proposed change 
removes the qualification that this paragraph applies to (and only to) shared 
arrays. Perhaps add an introductory sentence: "The elements of a shared array 
are laid out as follows."

Yes, I see I cut out too much from the existing wording.  I'll put the 
introductory stuff back in. 

> Also while we're changing that paragraph, I suggest we change the equation:
> ...

Sounds good to me.

> We're not currently displaying change authors, but please use your own 
initials in > the annotation id ;-)
> You might need to add your name to common/upc-spec-preamble.tex.

So that's what that field is for.  I just blindly copied it from the first 
other use of \xreplaced I saw.

Original comment by sdvor...@cray.com on 25 Feb 2013 at 7:23

GoogleCodeExporter commented 9 years ago
> No, casting (shared int (*)[10]) to (int (*)[10]) is exactly what we MUST 
forbid! 

I'm aware of that, and I agree the cast you mention needs to be prohibited, but 
that restriction is already specified elsewhere (6.4.3-1). I think you misread 
the second type in my statement "it casts (shared int (*)[10]) directly to (int 
*)". 

I still believe changing the final expression is sufficient to define this 
aspect of pointer arithmetic, without the need to change the variable 
definitions, as that introduces the other problems I mentioned. In particular, 
when the C99 spec says a pointer "points to an object", the type of the object 
in question always matches the referenced type of the pointer. Therefore with 
the proposed definition changes, the statement "S1 and P1 shall point to the 
same object" is a type error when *P1 is a non-array type and *S1 is an array 
type (they point to the "same address in memory", not to be confused with "the 
same object").

> The result is NOT guaranteed to have zero phase.  6.5.2.1 applies to taking 
the address of a variable whose 
> type is a shared array, not expressions with whose type is a shared array 
type in general.

OK, I now see your reasoning for the distinction between the two 
closely-related paragraphs in your proposal, and I agree that a PTS to a 
sub-array in the middle of an object might have non-zero phase. However I'd 
still like to see if we can somehow improve the wording to make it more 
approachable. 

In particular, I dislike the statement "A non-null PTS of type X always points 
to Y", because this may be false in a program that uses prior casts to point 
their PTS at garbage. I think it's generally preferable to describe the 
semantics of a language construct directly (ie the result of &-operator and PTS 
arithmetic on shared array types), instead of indirectly as a property of all 
programs using the construct in the way we expect. Simply changing "A non-null 
PTS.." to "A valid, non-null PTS" would be a step in the right direction.

As written the first paragraph also suffers from the subtle type error 
mentioned above, ie a "pointer-to-shared whose reference type is an array type 
points.." to an ARRAY, not to "an object of the ultimate element type". The 
second proposed paragraph avoids this problem by stating it points to the 
BEGINNING of an array object, so perhaps we can apply that wording here as 
well? This would leave us with something like:

  A valid, non-null pointer-to-shared whose reference type is an array type points to
  the beginning of the pointed-to shared array object.

However this is starting to sound suspiciously like "a pointer points to what 
it points to". 
What restriction are we really trying to specify here? Can you describe an 
implementation scenario we are trying to prohibit with this paragraph?

Original comment by danbonachea on 25 Feb 2013 at 1:10

GoogleCodeExporter commented 9 years ago
> I still believe changing the final expression is sufficient to define this 
aspect of pointer arithmetic, without the need to change the variable 
definitions, as that introduces the other problems I mentioned. In particular, 
when the C99 spec says a pointer "points to an object", the type of the object 
in question always matches the referenced type of the pointer. Therefore with 
the proposed definition changes, the statement "S1 and P1 shall point to the 
same object" is a type error when *P1 is a non-array type and *S1 is an array 
type (they point to the "same address in memory", not to be confused with "the 
same object").

Changing the final expression should be fine, if you think that's more clear.  
As far as the "points to the same object" question, what if we instead clearly 
distinguish the array and non-array cases:

  If ST is not an array type, then LT is equivalent to ST, and S1 and P1 shall point to the same object.  IF ST is an array type, then P1 shall point to the first object (with ultimate element type of ST) in the shared array object that S1 points to.

> In that case the expression: (LT*) S1  is at least "fishy", since it casts 
(shared int (*)[10]) directly to (int *), which technically has an incompatible 
referenced type.

Yes, but there does not exist a pointer-to-local with a compatible referenced 
type if ST is an array type, so (int *) is the best we can do.  The fundamental 
problem here is that, unlike objects of all other complete types in UPC, 
objects of shared array types are permitted to span threads.

> I believe this is still permitted by C99 (although the alignment restrictions 
in 6.3.2.3 might technically be problematic?), but utilizing that corner case 
for one of the axioms that defines our pointer arithmetic seems a bit "sketchy".

Since a pointer to an array has the same address as the first element of that 
array, explicitly casting down from a pointer to the array type to a pointer to 
the element type is always safe and well-defined.

As to using a "corner case" of C99 to define pointer-to-shared arithmetic being 
"sketchy", feel free to offer a replacement.

Original comment by sdvor...@cray.com on 25 Feb 2013 at 3:08

GoogleCodeExporter commented 9 years ago
I suppose another way to deal with this situation would be to make casting to a 
pointer-to-local have defined behavior only if the ENTIRE object that the 
pointer-to-shared points to has affinity to the local thread.  Then we can 
simply use the existing wording for paragraphs 4 and 5.  Would that be better?

Original comment by sdvor...@cray.com on 25 Feb 2013 at 3:59

GoogleCodeExporter commented 9 years ago
Attached is an updated draft that takes Dan's recent comments into account.  
Notable changes:

1. Fixed "Editorial stuff" from comment 73.

2. Reworded definitions of "ultimate element type" and "shared array" based on  
comment 73.

3. Reworded semantics of upc_elemsizeof based on comment 73.

4. As proposed in comment 77, made casting a pointer-to-shared that points to a 
shared array object that spans threads to a pointer-to-local result in 
undefined behavior.  This greatly simplifies a lot of the contentious language 
discussed in comments 73-76.

Original comment by sdvor...@cray.com on 25 Feb 2013 at 4:55

Attachments:

GoogleCodeExporter commented 9 years ago
> I suppose another way to deal with this situation would be to make casting to 
a 
> pointer-to-local have defined behavior only if the ENTIRE object that the 
pointer-
> to-shared points to has affinity to the local thread.  Then we can simply use 
the 
> existing wording for paragraphs 4 and 5.  Would that be better?

At first glance this seems like a reasonable resolution to me - it certainly 
simplifies the spec. Would it prohibit any important usage cases?

One potential drawback is this type of user error probably can't be statically 
diagnosed (although that's OK as far as compliance since it falls under 
undefined behavior). It's also "harder" for a high-quality implementation to 
generate a runtime warning for this case, in contrast to the "cast a pointer 
with remote affinity to local" which can be easily caught at runtime, with no 
type information. However since it only applies to pointer-to-shared-array type 
(an "advanced" user feature) then perhaps that's not a concern.

 \np The {\tt upc\_threadof} function returns the index of the
- thread that has affinity to the shared object pointed to by {\tt 
ptr}.\footnote{%
+ thread that has affinity to the shared object pointed to by {\tt ptr}.
+ \xadded[id=SV]{3}{If ptr points to a shared array object, then the
+ index of the thread that has affinity to the first element of data
+ storage that comprises the shared object is returned.}\footnote{%

I understand the purpose of this new wording is to handle "oddball" cases where 
the referenced object may have multiple affinity, like:
  shared [5] int A[10][10];
  upc_threadof(&A);
  upc_threadof(&A[1]);
However I'm concerned that a casual reader might misread the new sentence to 
imply the function always returns the affinity of the start of the array when 
passed a pointer to a shared array element:
  upc_threadof(&A[1][0])
Can we somehow clarify this new sentence is only concerned with the "oddball" 
case?

Minor nitpick: 
In the constraint sections for upc_*sizeof(), can we use the singular tense 
instead of the plural when discussing the operand?
Ie change:
   The upc_elemsizeof operator shall apply only to shared types or expressions with shared type.
to:
   The upc_elemsizeof operator shall apply only to a shared type or an expression with shared type.
I know this style was inherited from 1.2, but after reading it ten times I've 
realized that referring to a unary operand using a plural tense is 
unnecessarily confusing. The singular tense is also more stylistically 
consistent with similar phrases in C99.

Similarly in upc_elemsizeof semantics:
"For non-array operands" -> "For a non-array operand"

Original comment by danbonachea on 25 Feb 2013 at 7:53

GoogleCodeExporter commented 9 years ago
> Does the updated Cray compiler accept that program?

Yes, it does.

Original comment by sdvor...@cray.com on 26 Feb 2013 at 6:34

GoogleCodeExporter commented 9 years ago
> At first glance this seems like a reasonable resolution to me - it certainly 
simplifies the spec. Would it prohibit any important usage cases?

Not that I can think of.  Any case that it prohibits can be trivially worked 
around by using a pointer to the ultimate element type rather than a pointer to 
the array type.

> One potential drawback is this type of user error probably can't be 
statically diagnosed (although that's OK as far as compliance since it falls 
under undefined behavior). It's also "harder" for a high-quality implementation 
to generate a runtime warning for this case, in contrast to the "cast a pointer 
with remote affinity to local" which can be easily caught at runtime, with no 
type information. However since it only applies to pointer-to-shared-array type 
(an "advanced" user feature) then perhaps that's not a concern.

True, but generating a run-time check that covers most cases is really easy:

Given a pointer-to-local 'LPTR' with type 'T1 *' and a pointer-to-shared 'SPTR' 
with type 'shared [BS] T2 *', the run-time check merely needs to confirm that 
'sizeof(T1) <= (BS - upc_phaseof(SPTR)) * sizeof(T2)'.

The only case this doesn't catch is where SPTR points to a partially full final 
block of a shared array object, and the cast would require a "more full" block.

Original comment by sdvor...@cray.com on 26 Feb 2013 at 7:32

GoogleCodeExporter commented 9 years ago
Attached is another update of the proposed draft.  I think we're really close 
with this one.  Changes from that in comment 78:

1. Removed inserted text from upc_threadof().  Instead, modified the definition 
of the term 'affinity' (section 3.5) to define that shared objects have 
affinity to exactly one thread, that of the element of data storage containing 
the beginning of the shared object.  A footnote clarifies that for shared array 
objects, the affinity of the object itself may not be the same as that of all 
the elements of data storage containing it.

2. Small language changes based on Dan's "nitpicks" in comment 79.

Original comment by sdvor...@cray.com on 26 Feb 2013 at 8:14

Attachments:

GoogleCodeExporter commented 9 years ago
New diff is looking very good! Minor comments:

+ {Objects with ultimate element type making up shared arrays are
+ distributed in a round robin fashion, by chunks of block-size objects,
+ such that the i-th object}

Suggested rewording:
The objects with ultimate element type that comprise a shared array are ...

 \sterm{affinity}%
logical association between shared
objects and threads. Each element of data storage that contains
shared objects has affinity to exactly one thread.
+ \xadded[id=SV]{3}{All shared objects also have affinity to exactly one
+ thread, which is the same as that of the element of data storage
+ containing the beginning of the shared object.\footnote{For non-array
+ shared objects, all elements of data storage containing the object
+ have the same affinity as the object itself. This is not necessarily
+ true for shared array objects, which may span multiple threads.}}

I like the new text. I'm curious if it's also clear to readers who've spent 
fewer hours thinking about these issues?

Latex issue: you'll need to change \footnote to \truefootnote to ensure it's 
correctly rendered by the changes package (yes it's a filthy hack -- don't ask).

I also noticed there are two "spurious" changes:

- shared T *S1, *S2;
+ shared T *S1, *S2;

-\item The expression (({\tt (ptrdiff\_t) upc\_addrfield} (S2) - {\tt 
(ptrdiff\_t) upc\_addrfield(S1))} shall
+\item The expression ({\tt (ptrdiff\_t) upc\_addrfield} (S2) - {\tt 
(ptrdiff\_t) upc\_addrfield} (S1)) shall 

The first is just whitespace. The second one breaks the Latex margins and fonts 
for the equation.

Original comment by danbonachea on 26 Feb 2013 at 8:52

GoogleCodeExporter commented 9 years ago
> The first is just whitespace. The second one breaks the Latex margins and 
fonts for the equation.

I changed the second one for consistency--'(S2)' was not in the {\tt ...} block 
while '(S1)' was.  Either both should be, or both shouldn't be.

Original comment by sdvor...@cray.com on 26 Feb 2013 at 9:12

GoogleCodeExporter commented 9 years ago
> I changed the second one for consistency--'(S2)' was not in the {\tt ...} 
block while '(S1)' was.  Either both should be, or both shouldn't be.

If we're changing it for cosmetic reasons, please make sure the typeset result 
looks reasonable :)  

Given the length of that equation it might be simpler to place it on a line by 
itself and put the whole thing in {\tt}. The local equation should probably 
also be in {\tt}

While you're at it, this changed line:

+                         + (upc_phaseof(p) + elem_delta) div B) mod THREADS 

now also overflows into the right margin when typeset, and should probably be 
"scooted" left.

Original comment by danbonachea on 26 Feb 2013 at 9:26

GoogleCodeExporter commented 9 years ago
Attached is another update of the proposed draft.  Changes from that in comment 
82:

1. Reworded description of data distribution for definitely types shared arrays 
based on comment 83.

2. Removed spurious whitespace changes from upc-language.tex.

3. Fixed (?) formatting issues noted in comment 85.

Original comment by sdvor...@cray.com on 26 Feb 2013 at 9:59

Attachments:

GoogleCodeExporter commented 9 years ago
s/definitely types/definitely blocked/

Same typo in two bugs now today.  I need more caffeine! 

Original comment by sdvor...@cray.com on 26 Feb 2013 at 10:00

GoogleCodeExporter commented 9 years ago
Attached is another update of the proposed draft with minor typographical 
changes from that in comment 86:

1. Changed "objects of the ultimate element type" to "objects with ultimate 
element type" in 6.5.1.1 11 to match similar phrasing elsewhere in the document.

2. Corrected type "bjects" to "objects" in 6.5.2.1 5.

Original comment by sdvor...@cray.com on 26 Feb 2013 at 10:39

Attachments:

GoogleCodeExporter commented 9 years ago
+\xchangenote[id=SV]{3}{Changed formatting of last item.}

We've been following the convention of only annotating changes that affect 
actual wording or technical content. Typesetting fixes need not be annotated.

Your latest diff retracts this change:

- P1 = (T*) S1; /* allowed if S1 has affinity to MYTHREAD */
- P2 = (T*) S2; /* allowed if S2 has affinity to MYTHREAD */
+ P1 = (T*) S1; /* allowed if *S1 has affinity only to MYTHREAD */
+ P2 = (T*) S2; /* allowed if *S2 has affinity only to MYTHREAD */ 

The new definition of affinity obviates the need for the word "only".
However I believe we still want the S1->*S1 and S2->*S2 change (the old text is 
"loose" and technically incorrect - we're referring to the affinity of the 
array element, not the pointer object).

Original comment by danbonachea on 26 Feb 2013 at 10:46

GoogleCodeExporter commented 9 years ago
Changes from comment 88:

1. Updated 6.4.3 6 to reflect that a shared object that span multiple threads 
cannot be completely accessed by a local pointer on the thread which it has 
affinity to.

2. Removed change note for typesetting changes as noted in comment 89.

3. Updated comments in 6.4.2 4 to be a bit more precise.

Original comment by sdvor...@cray.com on 27 Feb 2013 at 12:18

Attachments:

GoogleCodeExporter commented 9 years ago
Overall looks very good.  One general question occurred to me:

We repeatedly use the phrase "element of data storage" - mostly in the sections 
changed by this proposal. This phrase is somewhat cumbersome, and unfortunately 
overloads the word "element", which we also use to discuss array elements (a 
closely related but distinct concept). 

C99 does not use this phrase at all, but it does use "unit of data storage", eg:

3.6 byte
addressable unit of data storage large enough to hold any member of the basic 
character set of the execution environment

The phrase "data storage" appears in the exactly 3 places in C99 - the 
definitions of bit, byte and object. The entire subsequent document then uses 
one of those three defined terms, as appropriate.

I believe all five occurrences of the phrase "element of data storage" in the 
current proposal actually mean "byte". I think it might improve clarity to 
simply say "byte". Thoughts?

Original comment by danbonachea on 27 Feb 2013 at 1:07

GoogleCodeExporter commented 9 years ago
> I believe all five occurrences of the phrase "element of data storage" in the 
current proposal actually mean "byte". I think it might improve clarity to 
simply say "byte". Thoughts?

I don't have any objection to that.  I tried to follow the existing practice 
from the 1.2 spec, but "element of data storage" is definitely a bit obtuse.

Original comment by sdvor...@cray.com on 27 Feb 2013 at 1:59

GoogleCodeExporter commented 9 years ago
Do I need to be concerned with the following errors:

$ make
...
------------------------------------
latex-mk:   LaTeX references have changed, another latex run is needed
------------------------------------
------------------------------------
latex-mk:   Failed to get LaTeX to converge after 5 tries
latex-mk:   Please fix the document or try again if you think it
latex-mk:   should be ok
------------------------------------
make[1]: *** [upc-lang-spec-draft.pdf] Error 1
make[1]: Leaving directory `/ptmp/sdvormwa/upc-specification-read-only/lang'
make: *** [make-default] Error 2

They seem to go away if I simply run make a second time:

$ make
...
LaTeX-Mk: Checking for ./upc-lib-nb-mem-ops-spec-draft.toc ...
make[3]: Leaving directory 
`/ptmp/sdvormwa/upc-specification-read-only/lib/proposed/nb-mem-ops'
make[2]: Leaving directory 
`/ptmp/sdvormwa/upc-specification-read-only/lib/proposed'
make[1]: Leaving directory `/ptmp/sdvormwa/upc-specification-read-only/lib'
$

Original comment by sdvor...@cray.com on 27 Feb 2013 at 5:01

GoogleCodeExporter commented 9 years ago
> Do I need to be concerned with the following errors:

Probably harmless. The changes package output takes a long time to converge, 
especially now that it's getting so large. I usually 'make clean' when I see 
this error to ensure I'm starting with a clean slate.

Original comment by danbonachea on 27 Feb 2013 at 5:06

GoogleCodeExporter commented 9 years ago
Changes from comment 90:

1. Replace uses of "element[s] of data storage" with "byte[s]".

2. Slightly reword definition of "affinity" to (hopefully) be a bit less 
awkward.

Original comment by sdvor...@cray.com on 27 Feb 2013 at 5:18

Attachments:

GoogleCodeExporter commented 9 years ago
Dan, does the following change (included in attachments in comments 90 and 95) 
address your concern in comment 89, or would you prefer to revert to the 
'allowed if *S1 has affinity to MYTHREAD' wording?

- P1 = (T*) S1; /* allowed if S1 has affinity to MYTHREAD */
- P2 = (T*) S2; /* allowed if S2 has affinity to MYTHREAD */
+ P1 = (T*) S1; /* allowed if upc_threadof(S1) == MYTHREAD */
+ P2 = (T*) S2; /* allowed if upc_threadof(S2) == MYTHREAD */ 

Original comment by sdvor...@cray.com on 27 Feb 2013 at 5:24

GoogleCodeExporter commented 9 years ago
Attached is a version of the test from comment 70 that has been updated to fix 
incorrect assumptions about the behavior of upc_addrfield() (see issues 106 and 
107).

Original comment by sdvor...@cray.com on 27 Feb 2013 at 6:00

Attachments:

GoogleCodeExporter commented 9 years ago
Revamped test to remove upc_addrfield() altogether, and rely solely on results 
of pointer arithmetic.

Original comment by sdvor...@cray.com on 27 Feb 2013 at 6:43

Attachments:

GoogleCodeExporter commented 9 years ago
> does the following change (included in attachments in comments 90 and 95) 
address your concern in comment 89

Yes, the new threadof() comment looks like a good improvement.

I'm satisfied with the latest draft in comment #95. I think we should mail the 
official PendingApproval change announcement and start the 4-week comment 
period. We can of course continue to tweak based on feedback from the committee 
as it arises.

Original comment by danbonachea on 27 Feb 2013 at 7:15

GoogleCodeExporter commented 9 years ago
Official proposal mailed 2013-02-27.

Original comment by sdvor...@cray.com on 27 Feb 2013 at 7:24

Attachments: