Extend PTS arithmetic to allow pointing to "one past" the end of a shared array

GoogleCodeExporter commented 9 years ago

Steve's comments moved from issue 106:

1. Is pointing to one element past the end of a shared array object valid (as 
it is for local objects by ISO/IEC 9899 6.5.6 8-9)?  If so, we should be sure 
that we get the expected behavior for those as well.  Note that this is a much 
larger change, as a lot of the spec assumes that any valid non-null 
pointer-to-shared points to an object.

This is trivial to express.  The existing equations in 6.4.2 3 define the exact 
behavior of upc_threadof() and upc_phaseof().  My proposal in comment 13 
suffices to define the behavior of upc_addrfield(), and can be trivially 
tweaked to define the local address as well.  Since you can't do 
pointer-to-shared arithmetic on generic pointers-to-shared, nor on 
pointers-to-shared whose referenced type is incomplete, we don't need to worry 
about what "one past" means in those cases, and it is well-defined for all 
others.

Original issue reported on code.google.com by danbonachea on 1 Mar 2013 at 6:25

GoogleCodeExporter commented 9 years ago

> The existing equations in 6.4.2 3 define the exact behavior of upc_threadof() 
and upc_phaseof().

Not for this "outlier" case they do not.

The first semantic paragraph of 6.4.2 reads (emphasis added)"

  When an expression that has integer type is added to or subtracted from a
  pointer-to-shared, the result has the type of the pointer-to-shared operand. If
  the pointer-to-shared operand points to an element of a shared array object,
  AND THE SHARED ARRAY IS LARGE ENOUGH, THE RESULT POINTS TO AN ELEMENT OF THE 
  SHARED ARRAY.

The case of a pointer incremented past the end of a shared array object does 
not meet the condition of this paragraph, which is a precondition on all the 
semantics that follow. Such an action therefore strays into undefined behavior, 
by C99 Sec 4-2:

  Undefined behavior is otherwise indicated in this International
  Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition
  of behavior. There is no difference in emphasis among these three; they all describe
  ‘‘behavior that is undefined’’.

Now we can possibly discuss EXTENDING the semantics of PTS arithmetic to allow 
for this case, but this is clearly undefined at the moment, so this would 
represent a CHANGE (and in my opinion a questionable one, with significant 
potential for implementation impact).

Original comment by danbonachea on 1 Mar 2013 at 6:37

GoogleCodeExporter commented 9 years ago

> Not for this "outlier" case they do not.
> ...
> AND THE SHARED ARRAY IS LARGE ENOUGH, THE RESULT POINTS TO AN ELEMENT OF THE 
SHARED ARRAY.

I specifically said, "The existing equations in 6.4.2 3", not "UPC 6.4.2 2-3" 
specifically for that reason.  I recognize that this is an extension.  ;)

Original comment by sdvor...@cray.com on 1 Mar 2013 at 6:40

GoogleCodeExporter commented 9 years ago

> Is pointing to one element past the end of a shared array object valid (as it 
is for local objects by ISO/IEC 9899 6.5.6 8-9)? 
> I recognize that this is an extension.  ;)

Sounds like we agree on the answer to your first question: this is clearly 
undefined in UPC 1.2. 

As such, proposals to add such semantics are orthogonal to a clarification to 
existing 1.2 semantics.

Original comment by danbonachea on 1 Mar 2013 at 6:51

GoogleCodeExporter commented 9 years ago

Original comment by danbonachea on 1 Mar 2013 at 6:51

Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

> Sounds like we agree on the answer to your first question: this is clearly 
undefined in UPC 1.2.

As I mentioned in issue 106, only for definitely blocked shared arrays.  For 
indefinitely blocked shared arrays, such a pointer is valid, and doesn't 
currently work with a lot of language features due to the use of "pointed-to 
shared object".

Original comment by sdvor...@cray.com on 1 Mar 2013 at 6:55

GoogleCodeExporter commented 9 years ago

> As I mentioned in issue 106, only for definitely blocked shared arrays.  For 
indefinitely blocked shared arrays, such a pointer is valid, and doesn't 
currently work with a lot of language features due to the use of "pointed-to 
shared object".

To clarify, I believe that supporting such pointers with definitely blocked 
shared objects is a new feature, which should NOT go in 1.3.  However, I 
believe that they are permitted in 1.2 for indefinitely blocked shared objects, 
but don't work correctly, and thus we should fix those in 1.3.

Original comment by sdvor...@cray.com on 1 Mar 2013 at 7:04

GoogleCodeExporter commented 9 years ago

>  I believe that they are permitted in 1.2 for indefinitely blocked shared 
objects, 
> but don't work correctly, and thus we should fix those in 1.3.

Please provide a concrete example of code and spec text from 1.2 to support 
this assertion.

Original comment by danbonachea on 1 Mar 2013 at 7:22

GoogleCodeExporter commented 9 years ago

UPC 1.2 6.4.2 2:

... If the shared array is declared with indefinite block size, the result of 
the pointer-to-shared arithmetic is identical to that described for normal C 
pointers in [ISO/IEC00 Sec. 6.5.6], except that the thread of the new pointer 
shall be the same as that of the original pointer and the phase component is 
defined to always be zero. ...

shared [] int A[NELEMS];
shared [] int *P = &A[NELEMS]; // Points to just past end of A, permitted by C99

if ( MYTHREAD == 0 ) {
    int *lp = (int *)P; // Undefined, because P does not point to any object, and
                        // 6.4.3 does not define the results of such a cast

    if ( upc_threadof( P ) == 0 ) {
        // Implied by 6.4.2 2, but upc_threadof() only has a defined value for
        // non-null pointers that point to an actual object, which P does not
    }
}

Original comment by sdvor...@cray.com on 1 Mar 2013 at 8:00

GoogleCodeExporter commented 9 years ago

> shared [] int A[NELEMS];
> shared [] int *P = &A[NELEMS]; // Points to just past end of A, permitted by 
C99
>     int *lp = (int *)P; // Undefined, because P does not point to any object, 
and
>                         // 6.4.3 does not define the results of such a cast

Agreed, but I don't really see this as a problem. Casting a PTS to unallocated 
space to a PTL does not have any behavior guaranteed by the spec, and therefore 
has undefined behavior. It might be nice if this was guaranteed, but since it's 
not I don't see how this constitutes something that "don't work correctly". 
Also, if you really want to construct such a pointer, it is already very easy 
to do so without straying into undefined behavior. Namely:

 int *lp = ((int *)A) + NELEMS;

> but upc_threadof() only has a defined value for
> non-null pointers that point to an actual object

I agree that upc_thread(pts) where pts points to unallocated space has 
undefined behavior. I don't really see this as a major problem either - it's 
perhaps less than ideal, but currently it's just a corner case the library 
semantics render undefined. The only case where pts could be a well-defined 
pointer value under 1.2 is if its referenced type is indefinitely-blocked, in 
which case the LOGICAL thread affinity specified by 6.4.2-2 is trivially 
identical to the pointer used in the expression used to create pts. The library 
function isn't guaranteed to give you a correct answer for that corner case, 
but I'm having trouble seeing why a real code would care to execute that query 
in the first place. Even if it did, there's an obvious workaround when you know 
you're in this case, which is to simply call upc_thread(pts-1) -- or 
alternatively allocate the original object with a trailing "fence" element so 
that pts meets the requirement of pointing to a shared object.

Original comment by danbonachea on 1 Mar 2013 at 8:31

GoogleCodeExporter commented 9 years ago

I'm just commenting on the rationale for extending the arithmetic to make 
computing the one-past address valid, not the details of how we would modify 
UPC to allow for it.  It's the same rationale as in C, really, to allow the 
one-past address to act as the terminating value of a pointer loop...

You could have a function like this:

    void copy_to_me( int* local, shared [B] int* p, int count )
    {
        while ( count-- > 0 ) *local++ = *p++;
    }

Or you might want this form:

    void copy_to_me( int* local, shared [B] int* start, shared [B] int* stop )
    {
        while ( start < stop ) *local++ = *start++;
    }

The pure C99 local pointer analogues of these two functions are legal, but 
calling the second version in UPC:

    shared int A[THREADS];
    ...
    copy_to_me( my_buffer, &A[0], &A[THREADS] );

Is currently questionable for arbitrary block size B != 0 because &A[THREADS] 
is a one-past address.

Original comment by johnson....@gmail.com on 1 Mar 2013 at 9:32

GoogleCodeExporter commented 9 years ago

> Is currently questionable for arbitrary block size B != 0 because &A[THREADS] 
is a one-past address.

Agreed - that code currently has undefined behavior in 1.2.

One of my main concerns with modifying the spec to make this code well-defined 
for arbitrarily blocked arrays is outlined below. I'm not positive this is a 
"show-stopper" for this potential new feature, but it's at least food for 
thought.

In C99, arrays are strictly linear in memory, and given pointers to any two 
disjoint elements, there is a natural total order on those elements defined by 
the linear memory address. It is therefore easy to unambiguously discuss the 
"last" element in the array object, because it is the one that is totally 
ordered after all the others. It doesn't matter if the array was declared 
statically or allocated dynamically, or the details of the pointer types used 
to access it, the "last byte" of the object is always a well-defined and unique 
location, as therefore so is "one past the end".

This basic property of heap memory is not true in UPC. Given pointers to any 
two distinct elements in a blocked shared array with affinity to *different* 
threads, there may be no unique order between those elements (see comment #1 in 
issue 104 for details). The ordering of the two elements depends upon the 
blocksize of the pointers used to ask the question. There are even obscure 
cases where the question returns an undefined answer (comparison of 
indefinitely-blocked PTS or dephased blocked PTS). As a consequence, I'm not 
convinced that the "last" element in an UPC array is always unique and always 
well-defined, let alone "one-past" that location. 

For a statically-declared shared array, one could potentially fall back upon 
the blocksize in the declaration that creates the object and use that to 
uniquely (and somewhat arbitrarily) define the "last" element (although note 
that due to blocking, "one-past" that last element might actually have affinity 
to a different thread and have a lower "local address"). However I don't think 
this works for dynamically-allocated arrays at all, because there is no *array 
type* in the code to provide a well-typed blocking factor for use in defining a 
canonical "last element" and "one past it". There are only pointers to shared 
data, which may alias that data using different blocking factors. If some of 
the pointers used to access slices of that shared array are 
indefinitely-blocked, one could make an argument that there is a "last element" 
on several threads, one corresponding to each such pointer. Thus "one past the 
last element" is no longer a unique location. I'm worried about the semantic 
implications of that, and the complexity of the spec-speak that would be needed 
to unambiguously define this feature.

Original comment by danbonachea on 1 Mar 2013 at 10:51

GoogleCodeExporter commented 9 years ago

In the 3/15/13 telecon, we discussed this issue and decided that it should be 
deferred to a future spec revision, to allow further time for study.

Original comment by danbonachea on 16 Mar 2013 at 1:16

Intrepid / upc-specification

Extend PTS arithmetic to allow pointing to "one past" the end of a shared array #109