Intrepid / upc-specification

Automatically exported from code.google.com/p/upc-specification
0 stars 1 forks source link

Can pointer-to-shared arithmetic be specified in terms of the / and % operators of the C language? #43

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This is for issue 48.

For self-consistency, can pointer-to-shared arithmetic be specified in terms of 
the / and % operators of the C language?  (If such a definition appears too 
complex, then perhaps pointer-to-shared arithmetic is too complicated?)

Original issue reported on code.google.com by yzh...@lbl.gov on 22 May 2012 at 11:47

GoogleCodeExporter commented 9 years ago
Issue 44 has been merged into this issue.

Original comment by yzh...@lbl.gov on 22 May 2012 at 11:57

GoogleCodeExporter commented 9 years ago
I'm game. I don't think it's overcomplicated, but it takes a while to define 
the terms. Also, it's murder on architectures where integer divisions are not 
pipelined (grrr).

Consider a shared array "A" declared with blocking factor "BF" and element size 
"ES". BF=0 is a trivial special case that I'm not going to bother with. ES=0 is 
an error.

The array is allocated on threads T = { 0..THREADS-1 } at memory addresses 
base[t], t \in T.

Given index "i" we want to find out

* The UPC thread "thrd" that A[i] is affine to
* The memory address "addr" on T that A[i] is located at on thread "thrd"

Step 1: we calculate the "block index" and "block offset" of index i. Consider 
"A" laid out as a contiguous set of blocks - I want to know which block "i" is 
in, and where in the block we are.

blockIDX    = i / BFES;   /* "block index" */
blockOFF    = i % BFES;   /* byte offset in block */

Step 2: Given the block index we can calculate "thrd" and "addr" directly:

thrd        = blockIDX % THREADS;
addr        = base[thrd] + blockIDX / THREADS + blockOFF;

Original comment by ga10...@gmail.com on 24 May 2012 at 11:33

GoogleCodeExporter commented 9 years ago
I'm afraid that it isn't that simple.  The array case is not bad because array 
subscripts are generally assumed to be non-negative, but the same is not true 
for pointer addition.  You can have p + i where p is a pointer-to-shared and i 
has an unknown value.  For the case where i < 0, the C division and modulus 
operators differ from the ones that the UPC spec uses.

Original comment by johnson....@gmail.com on 24 May 2012 at 3:57

GoogleCodeExporter commented 9 years ago
Troy, good point. Pointers are indeed a bit more difficult, but only because - 
I predict - we will not completely agree on the representation of a 
pointer-to-shared. too. Here is an algorithm for a generic pointer increment, 
starting with a generic representation for a pointer-to-shared. Please comment.

Pointer representation
======================

Let us consider the portable part of the pointer be a pair { thrd, offset } 
where "offset" is the offset in bytes from the base of the array. We consider 
the offset to be always positive - if the offset points to before the beginning 
of the array it's an access violation.

Pointer increment: definition
=============================

I start with a pointer-to-shared "ptr" and I want to increment this pointer by 
some value "incr". I interpret "ptr" as pointing to an array of type { BF, ES}. 

Algorithm
=========

Step 1: determine the "linear index" of the pointer in bytes (this is an 
imaginary offset in an imaginary array laid out across all processors). 
Essentially: WHERE AM I IN THE SHARED ARRAY?

  blkOFF    =  ptr.offset % BFES;
  blkIDX    =  ptr.offset / BFES;
  upcIDX    =  blkOFF + (blkIDX*THREADS + ptr.thread)*BFES;

Step 2: increment the linear index. This is a simple increment of the upcIDX.

  upcIDX   +=  ES * incr;

Step 3: re-transform the linear index back into a pointer. This code should be 
familiar from my previous posting.

  blkOFF     = upc_idx % BFES;
  blkIDX     = upc_idx / BFES;
  ptr.thread = blkIDX % THREADS;
  blkIDX     = blkIDX / THREADS;
  ptr.offset = BFES * blkIDX + blkOFF;

Original comment by ga10...@gmail.com on 24 May 2012 at 5:35

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
If you take "ptr.offset" to refer to the language-required "ptr.addrfield", 
thus making the imaginary array span the entirety of what might be considered 
shared space, I believe this is correct.

Original comment by brian.wibecan on 24 May 2012 at 6:08

GoogleCodeExporter commented 9 years ago
We saw this definitional aspect of '%' crop up on a test failure in the a GUTS 
'main' test: iteration4.  It may be worth filing an issue on this, but I could 
use some help in characterizing the issue.

Until we fixed the compiler, the test was failing on this test, when compiled 
in a static THREADS compilation environment.

This test fails if compiled in static env for threads greater then 16, and only 
for negative indexes. Here is the simplest failing code:

#define N 10
upc_forall (i=-N; i<N; i++; i) {
  if (i >= 0)
    check(i % THREADS == MYTHREAD);
  else
    check(((i % THREADS)+THREADS)%THREADS == MYTHREAD);
}

You'll note that the index, 'i' ranges from -N .. (N-1).  You'll also note that 
the test doesn't make any shared references at all, so the question of 
array/pointer bounds doesn't come into play.

The bug turned out to be due to the fact that the compiler used an unsigned 
type when performing the '%' operation, and further, this unsigned '%' was 
optimized to be performed as a multiply by the reciprical and that gave 
incorrect values when i is negative. 

This reference discusses the defined behavior of %.
http://bytes.com/topic/c/answers/444522-modulus-negative-operands [^]
The gist is that in C89 this was implementation-defined. In C99 they defined 
'/' as always rounding towards 0, which is conventional and compatible with 
Fortran.

If 'affinity' is determined in terms of '%', but '%' returns a negative number, 
how can upc_forall() determine affinity?

(For the moment, we won't worry about whether affinity is defined on negative 
index values.)

The test masks over this issue by doing something different for negative values 
of 'i'.

  if (i >= 0)
    check(i % THREADS == MYTHREAD);
  else
    check(((i % THREADS)+THREADS)%THREADS == MYTHREAD);

In any event, we "fixed" this test failure by insuring that the division was 
done using signed arithmetic.

BTW, philosophically, I agree with the interpretation that p[-1] for some value 
of PTS 'p' should be (and likely already is in C99?) as an array out-of-bounds 
error?

Original comment by gary.funck on 24 May 2012 at 6:34

GoogleCodeExporter commented 9 years ago
Brian, good point about addrfield. IIRC the language spec makes no 
representation about comparing the addrfields of two different shared arrays, 
so I think I might be safe even if I consider just a single shared array at a 
time, and consider addrfield to be zero where the base of the array lives.

Brrr, I don't want to get into this business of the modulus of a negative 
number. Remainders are positive by definition :)

Original comment by ga10...@gmail.com on 24 May 2012 at 7:35

GoogleCodeExporter commented 9 years ago
My point about using addrfield was to move away from an implementation-specific 
concept: the base address of the object into which a pointer points, which is 
not required to be tracked by the implementation.  Pointer arithmetic is well 
defined with reference to addrfield.  How addrfield relates to the base of the 
shared object is up to the implementation.  Addrfield doesn't even have to be 
used as part of the pointer representation; it can be fabricated when 
requested, as I believe some of the implementations do, so long as it exhibits 
the required behavior.

I don't recall what promises the language makes regarding pointer operations 
for pointers that don't point within the same object, but I'm sure they can be 
compared for equality or tested to see if they are NULL.  Testing for equality 
I believe is defined as matching addrfield and thread values.  So, pointers 
into two different objects must have different addrfield base values, even if 
the value used in real pointer arithmetic is the offset from the array 
beginning and thus the same for both. 

Original comment by brian.wibecan on 24 May 2012 at 9:30

GoogleCodeExporter commented 9 years ago
Brian wrote:
> I don't recall what promises the language makes regarding pointer operations 
for
> pointers that don't point within the same object, but I'm sure they can be 
compared
> for equality or tested to see if they are NULL.  Testing for equality I 
believe is
> defined as matching addrfield and thread values.  So, pointers into two 
different
> objects must have different addrfield base values, even if the value used in 
real
> pointer arithmetic is the offset from the array beginning and thus the same 
for
> both. 

I believe Brian is right: tests for equality (including against NULL) are the 
only things C99 allows for pointers to distinct objects, and so UPC is the same 
unless we added extra rules.

I believe this allows a pointer representation in C99 that consists of, for 
instance, a segment number and an offset in that segment.  I UPC, that would 
translate (puns intended) to a representation in which the addrfield (or the 
portion of the internal representation from which it is derived on demand) 
contains a tuple such as (block, offset): a "good" representation in a system 
which dynamically grows the shared heap in chunks that might not be virtually 
contiguous.

Regarding arithmetic with negative values, we do need to be able to get correct 
values for the following expressions where sign of the result (first case) or 
of a operand (second case) is not known at compile time:
    j = &a[i] - p;  // where p points within a[], either before OR after a[i]
    p += i;  // where i can have either sign, but is still within a single array

I am guessing these are the sort of case that had Troy particularly concerned 
about the complexity of the code that the compiler must currently emit.

Original comment by phhargr...@lbl.gov on 25 May 2012 at 7:03

GoogleCodeExporter commented 9 years ago
Paul, thanks for the clarification.  Although I am still sympathetic to p[-1] 
being considered out-of-range, I can see now how that such a reference can 
still be in bounds, so please dis-regard that comment.

Question: _if_ block sizes were eliminated would it eliminate the need for 
special handling of signed indices vis-a-vis the language defined '%' operator 
semantics?

Original comment by gary.funck on 25 May 2012 at 8:19

GoogleCodeExporter commented 9 years ago
If block sizes were eliminated, things are a lot simpler, but care still needs 
to be taken.  The integer division operation is defined to be consistent with 
the modulus operation, such that

Q = A / B
R = A % B
A = B * Q + R

This relationship is true for the mathematical versions of the operations 
(remainder always positive) as well.  Thus, if one of the operations (division 
or remainder) is different between the two versions (C or math) for negative 
values, so is the other one.  Eliminating block size removes the need for the 
use of the remainder operation, but calculating a new thread value still 
requires use of the division operation.  It is possible it can be worked 
around, but that's another way of saying "special handling".

Original comment by brian.wibecan on 25 May 2012 at 9:06

GoogleCodeExporter commented 9 years ago
Gary Asked:
> Question: _if_ block sizes were eliminated would it eliminate the need for 
special
> handling of signed indices vis-a-vis the language defined '%' operator 
semantics?

Brian Answered (in part):
> If block sizes were eliminated, things are a lot simpler, but care still 
needs to
> be taken.
[...]
> Eliminating block size removes the need for the use of the remainder 
operation,
> but calculating a new thread value still requires use of the division 
operation.

Perhaps I am missing something here, but the "back of my envelope" disagrees 
with Brian's envelope.  I agree things get simpler if only 0 and 1 are legal 
block sizes.  Obviously, the blocksize=0 case is just C pointer arithmetic.  
However, the in the cyclic case I believe it is the REMAINDER operator (not 
DIVISION as Brian states) that is needed to determine the new thread value.  
(Perhaps Brian just meant we perform the REMAINDER via 
DIVIDE+MULTIPLY+SUBTRACT, but I doubt it).

Depending on the PTS representation, it MIGHT (I've not tried to test this 
hypothesis) to perform all (pointer +/- integer) arithmetic as offsets relative 
to an "arbitrary" base address (the base of SOME object - could be "array", 
"heap" or "chunk of heap") which would ensure that all LEGAL (defined as all 
pointer input and output lie within the same object) operations involve only 
non-negative operands to REMAINDER.  If that is so, then I believe that 
elimination of block sizes could allow many/most implementations to use the C 
'%' operator without any branches to check signs.  In a static-threads 
environment with power-of-2 values of both THREADS and the element size, that 
means one gets to optimize all the way to bitwise-AND:
   new_threadof = linear_offset & (THREADS * sizeof(element) - 1);
Off course computing the linear_offset might be non-trivial but is hopefully 
cheaper than integer division.

Original comment by phhargr...@lbl.gov on 25 May 2012 at 10:07

GoogleCodeExporter commented 9 years ago
With my previous comment in mind, I think I am on the track of definition of 
shared pointer arithmetic to replace the one in paragraph 3 of 6.4.2 (the one 
using the "abnormal" div and mod operators) with one based on the C '/' and '%' 
operators.  The resulting definition will be LONGER, and therefore probably 
LESS amenable to implementation than one may wish for.  However, this issue 
only really requested such a definition "for self-consistency".

I still need some work to prove the correctness to my self, but the idea is to 
replace "i" in the current definition with a properly chosen non-negative "j" 
such that C99's "%" can be applied in the phase computation.  The thread 
computation then needs EITHER a "correction" factor, or a "k" computed from "i" 
and distinct from "j".  The belief is that I can construct something with "%" 
and "/" without any branches.  We'll see if I reach that or not...

Original comment by phhargr...@lbl.gov on 25 May 2012 at 11:56

GoogleCodeExporter commented 9 years ago
First of all, my apologies to Gheorghe for attempting in comment #13 to 
reinvent the algorithm he had already described quite well in comment #2 and 
comment #4.

Anyway, here is what I've came up with:

  r = B * THREADS; // Row length
  j = (i>=0) ? i : (i + (1 - i/r)*r); // Non-negative replacement for i
  ph_out = (ph + j) % B;
  th_out = (th + (ph + j) / B) % THREADS;

As hoped the new definitions replace "div" and "mod" with "/" and "%" simply by 
replacing "i" with a properly chosen "j".  These definitions hold for *any* 
non-negative "j" which differs from "i" by an offset which is a multiple of 
both B and of THREADS (thus vanishing in the "%" operations).  If one can offer 
a cheaper computation of a suitable "j", then it can be substituted.

I will note once again that this "trick" is intended for revising the spec 
definition of how phase and thread are computed to read in terms of the 
standard C division and remainder operations.  The actual computation of the 
correct addrfield (or the internal value from which is is computed) in an 
implementation would potentially become MORE complicated using this formulation 
(brain hurts too much at this point to follow through that far).

Somebody (or EVERYBODY if possible) *PLEASE* double check that my math is right.
If so, then...

Proposal:

Replace paragraph 3 of section 6.4.2 and its footnote[8] with:

3  After this assignment the following equations must hold in any UPC
   implementation.  In each case the '/' operator rounds toward zero and the
   '%' operator returns the non-negative remainder, as required by C[8]:
     upc_phaseof(p1) == (upc_phaseof(p) + j) % B
     upc_threadof(p1) == (upc_threadof(p) + (upc_phaseof(p) + j) / B) % THREADS
   where
     size_t r = B * THREADS;
     size_t j = i + ( (i >= 0) ? 0 : r * (1 + (-i)/r) );

[8] The use of the non-negative value "j" in the equations allows the division
and remainder operations defined by C to be used, rather than alternative 
versions
based on division which rounds toward negative infinity.

Original comment by phhargr...@lbl.gov on 26 May 2012 at 2:02

GoogleCodeExporter commented 9 years ago
We just hit this little problem in a test case. It is related to the earlier 
discussion on the bevavior of the % operator on negative values. 

Would you consider the following access legal?

shared [BF] int A[...];
shared [BF] int * p = A-1;
shared [BF] int * q = p+1;
...
*q ...
...

Note that p points to an illegal location, but is not dereferenced.
q points to A[0], and *is* dereferenced.

1) Do you consider this correct code? 
2) Does it pass your particular favorite flavor of UPC?

-- George

Original comment by ga10...@gmail.com on 11 Jun 2012 at 9:03

GoogleCodeExporter commented 9 years ago
George asks
> 1) Do you consider this correct code? 
> 2) Does it pass your particular favorite flavor of UPC?

1) I am not sure if I believe this is legal.
I would be inclined to think it would NOT be legal in C99 for a normal array, 
but would work with most compilers.  I am skimming the C99 spec now...

2) With Berkeley UPC this code "works", getting q==&A[0].
HOWEVER, if "1" were replaced by something like "1000000" then in a "-g" 
compilation we'd notice the computed value of "p" was outside the shared heap 
and report it as an error.

Caveat: with optimization enabled "p" and its intermediate value might be 
discarded entirely.

Original comment by phhargr...@lbl.gov on 11 Jun 2012 at 9:41

GoogleCodeExporter commented 9 years ago
OK, found it.  Short answer: UNDEFINED BEHAVIOR.

Long version.
C99 6.5.6 (Additive Operators) in Semantic 8 says, in part:

"If both the pointer operand and the result point to elements of the same array 
object, or one past the last element of the array object, the evaluation shall 
not produce an overflow; otherwise, the behavior is undefined. If the result 
points one past the last element of the array object, it shall not be used as 
the operand of a unary * operator that is evaluated."

So, one CAN portability perform pointer arithmetic that would place "p" at most 
one-past-last (but not dereference it), but NOT one-before-first.

Of course George's compiler still needs to generate proper code for
  shared [BF] int * p = &A[1] - 1;

Original comment by phhargr...@lbl.gov on 11 Jun 2012 at 9:55

GoogleCodeExporter commented 9 years ago
An interesting variation on the test case follows.

#include <upc.h>
#include <assert.h>

shared [3] int A[30*THREADS];
shared [3] int *p = A-1;

int main(void)
{
  shared [3] int *q = p + 1;
  if (!MYTHREAD)
    A[0] = 100;
  assert (A[0] == *q);
}

Here 'p' is file scoped and is initialized "statically".  In the GUPC 
implementation it is not initialized statically in the usual sense of the term. 
 Initialization code is generated which is run before main is called.

Original comment by gary.funck on 11 Jun 2012 at 9:55

GoogleCodeExporter commented 9 years ago
Yes, a barrier is needed in the example.

#include <upc.h>
#include <assert.h>

shared [3] int A[30*THREADS];
shared [3] int *p = A-1;

int main(void)
{
  shared [3] int *q = p + 1;
  if (!MYTHREAD)
    A[0] = 100;
  upc_barrier;
  assert (A[0] == *q);
}

Original comment by gary.funck on 11 Jun 2012 at 9:59

GoogleCodeExporter commented 9 years ago
> I will note once again that this "trick" is intended for revising the spec 
definition of how
> phase and thread are computed to read in terms of the standard C division and 
remainder operations.
> The actual computation of the correct addrfield (or the internal value from 
which is is computed)
> in an implementation would potentially become MORE complicated using this 
formulation (brain hurts
> too much at this point to follow through that far).

I may be reading too much into this comment, but I'm concerned that there is an 
underlying assumption that the existing formula in the spec is *always* 
directly implementable without going through this exercise (to figure out how 
to put it in terms of the C operators).  There may not be a hardware 
instruction that performs the division and modulus operations with the desired 
UPC rounding, so this whole exercise is necessary for such targets in order to 
use the hardware's integer division and modulus operations to emulate the 
desired UPC rounding.  x86 is one such architecture.

Original comment by johnson....@gmail.com on 12 Jun 2012 at 3:00

GoogleCodeExporter commented 9 years ago
Troy wrote:
> I may be reading too much into this comment, but I'm concerned that there is 
an
> underlying assumption that the existing formula in the spec is *always* 
directly
> implementable without going through this exercise (to figure out how to put 
it in
> terms of the C operators).  

OK, lets talk in terms of IMPLEMENTATION formula...
Ours is all in BSD-licensed source.  So, I have no secrets.

In Berkeley UPC we do source-to-source translation use a (forced-)inline 
function for pointer arithmetic (pointer +/- ptrdiff_t), which contains a 
branch on the sign of the integer.  We expect the common case to be adding of 
constants, and thus that the back-end compiler will discard one branch.

On the negative branch we use a "standard mathematics transformation" (see 
below).  So, if one is happy with a "if non-negative use eq1 else use eq2" in 
the spec, then we can take that route:

For non negative 'i':
  ph_out = (ph + i) % B;
  th_out = (th + (ph + i) / B) % THREADS;
for negative 'i' we "shift" numerators before divmod and must "unshift" the mod 
result:
  let ph_tmp = ph - (B - 1);
  let th_tmp = th - (THREADS - 1);
  ph_out = (ph_tmp + i) % B + (B - 1);
  th_out = (th_tmp + (ph_tmp + i) / B) % THREADS + (THREADS - 1);
This is the math that forms the "guts" of the current BUPC implementation, with 
the exception of the addrfield computation.  That uses the "/ THREADS" that 
matches with the "% THREADS" used to compute 'th_out'.  [The actual C code is 
expressed with more temporaries for CSE (nearly SSA form) and groups the 
related '/' and '%' together to help the back-end compiler pair them up.]

Or, if one prefers, the following combined form is possible:
  let a = (i < 0) ? 1 : 0;
  let ph_tmp = ph - a * (B - 1);
  let th_tmp = th - a * (THREADS - 1);
  ph_out = (ph_tmp + i) % B + a * (B - 1);
  th_out = (th_tmp + (ph_tmp + i) / B) % THREADS + a * (THREADS - 1);
where 'a' is 0 or 1 in order to "fold" the positive and negative formula 
together.
In using this case for code generation, one hopes to decide 'a' at compile time 
and expects the optimizer to either discard the zero terms in the 
non-negative-i case, or to discard multiply-1 in the negative case.  However, 
we've found the formulation above which branches instead of using 'a' to 
optimize more consistently.

Are either of those more to the groups's liking for use in the spec?

BTW:
As a result of the pencil-and-paper math I've done for this tracker issue, our 
next release will use these formula only when 'i' is NOT among a set of special 
cases.  If 'i' is 1, -1 or an integer multiple of B the math will be done with 
one less div/mod (where a "bounds check" comparison is used for 1 and -1).  
These are "picked off" with careful use of __builtin_contant_p().  For static 
threads we also pick off cases that are THREADS times the previous cases and 
remove an additional '%' operation.  We've also special-cased i==0.

The [] and [1] layout qualifiers are handled by different code entirely, which 
omits phase, just as I have heard others say they compilers do.

Original comment by phhargr...@lbl.gov on 12 Jun 2012 at 7:19

GoogleCodeExporter commented 9 years ago
To double down on Paul's comments - the code I posted earlier is literally 
stolen from the PGAS runtime - so no secrets folks, this is *exactly* how we do 
pointer arithmetic. 

I would like, however, do the right thing and handle pointer arithmetic with 
negative array indices - as long as none of those negative indices get actually 
dereferenced. Let's face it, A[p-1+1] happens too often - flagging it as an 
error at runtime would be somewhat acceptable, executing it "correctly" is 
preferred, but letting it cause unexplained segmentation violations is not 
something I want to live with as a UPC programmer.

I will attempt to evaluate Paul's proposed formula to see how it stands up to a 
bit of battering with negative indices. If it works, we should enshrine it in 
the documentation in some form. "Advice to implementors", maybe?

Original comment by ga10...@gmail.com on 15 Jun 2012 at 3:09

GoogleCodeExporter commented 9 years ago
George wrote:
> I will attempt to evaluate Paul's proposed formula to see how it stands up to 
a
> bit of battering with negative indices. If it works, we should enshrine it in 
the
> documentation in some form. "Advice to implementors", maybe?

George, 

We've been using this code, including tests with negative "increments", for a 
while now.  I would be very surprised if these formula were to fail your 
battering, but if they do then I *REALLY* want to know of the problem.  
However, see my note below about signed arithmetic before you start battering.

For completeness, in our implementation the addrfield math is (using the 
"combined form" formula with 'a'):
   block_incr = (th_tmp + (ph_tmp + i) / B) / THREADS;
   elem_incr = (ph_out - ph) + B * block_incr;
   addr_out = addr + elem_incr * upc_elemsz(ptr);

Note that if one fails to use signed types for some of the computations, then 
these formula might give HORRIBLY incorrect results.  I can be sure about 
others, but I recall that at least ph_tmp, th_tmp and elem_incr probably need 
to be signed.  We've used ptrdiff_t for most temporaries.

Original comment by phhargr...@lbl.gov on 15 Jun 2012 at 11:50

GoogleCodeExporter commented 9 years ago
Although the consensus may be heading towards "no change", or "Add this as 
discussion in the TBD "implementer's notes", tagging this as spec. 1.3 for now.

Yili, you're the owner of this issue.  Please change its status if you prefer 
to give it a different disposition.

Original comment by gary.funck on 2 Jul 2012 at 5:37

GoogleCodeExporter commented 9 years ago
Correction: Troy's the owner, Yili the original reporter.  Either/both of you 
-- please change the issue status, if you would like to give it a different 
disposition.

Original comment by gary.funck on 2 Jul 2012 at 5:40

GoogleCodeExporter commented 9 years ago
Gary wrote:
> Although the consensus may be heading towards "no change", or "Add this as 
discussion in the TBD
> "implementer's notes", tagging this as spec. 1.3 for now."

I would like to see a change in 1.3 to replace the current definition with 
something based on any one of the example formations I offered (or any other, 
this isn't about my pride).  I believe that as long as we lack run-time 
variable block sizes, there will be library implementers who will NEED to 
perform their own pointer arithmetic (see MTU's collectives reference 
implementation as an example).  While the MTU Collectives example is 
"borderline", my point is that I believe some USERS will need to perform 
arithmetic on pointers-to-shared.  By NOT giving the users a "plain C" version 
of the definitions we make their task more difficult (and more error prone if 
they just type in the current definition).

So, do others agree or disagree that a spec change is appropriate for 1.3?

P.S.  Perhaps for 1.4 we can look at providing facilities for users (library 
implementers in particular) to perform address arithmetic?  They might, for 
instance, pass a (shared void *), the block size, the element size, and an 
integral increment and return a (shared void *).   A PTS subtraction routine 
which takes blocksize and elemsize arguments would probably also be required.  
Anybody else interested enough is encouraged to open a new tracker issue.

Original comment by phhargr...@lbl.gov on 2 Jul 2012 at 5:52

GoogleCodeExporter commented 9 years ago

Original comment by gary.funck on 3 Jul 2012 at 3:10

GoogleCodeExporter commented 9 years ago
Sorry for the month-and-a-half delay. In reference to comment 24 - we changed 
our array index arithmetic to handle negative indices also, and it seems to get 
around the problem of p - a + b w/o causing any problems, various blocking 
factors and stuff notwithstanding.

Looking back to comment 1 - the original request related to a functional 
description of array index arithmetic. Maybe we should also talk about a 
functional test set for index arithmetic. Do people on this list believe that 
the Berkeley bugzilla test cases and GUTS are sufficient for this? I don't have 
a good feeling as to how many corner cases there are out there not tested by 
existing functional test cases.

Original comment by ga10...@gmail.com on 3 Aug 2012 at 6:18

GoogleCodeExporter commented 9 years ago
Set default Consensus to "Low".

Original comment by gary.funck on 19 Aug 2012 at 11:26

GoogleCodeExporter commented 9 years ago
Mass change "Accepted" issues which haven't seen activity for over a month back 
to "New", for telecon discussion.

Original comment by danbonachea on 4 Oct 2012 at 11:36

GoogleCodeExporter commented 9 years ago
"I would like to see a change in 1.3 to replace the current definition with 
something based on any one of the example formations I offered .. library 
implementers who will NEED to perform their own pointer arithmetic .. By NOT 
giving the users a "plain C" version of the definitions we make their task more 
difficult"

Looking over the history of this discussion, nobody seems to be claiming the 
current spec definition for PTS arithmetic is ambiguous or incomplete (modulo 
issue 85 wrt multi-dim arrays). The formal definition leverages two operators 
which are not part of C99 (but are mathematically defined). The "equivalent 
formulations" written in pure C99 are significantly more complicated and 
therefore harder to read and understand. The purpose of the spec is to provide 
a formal and complete behavioral description, not an operational recipe for 
implementation, especially when the former is more concise and the latter may 
have the side-effect of subtly imposing additional, unnecessary implementation 
constraints.

The existence of one or more "reference implementations" of PTS arithmetic in 
pure C99 may be helpful to compiler and library writers, but in my opinion that 
alone doesn't justify the inclusion of that code in the formal specification. 
The rationale document seems like a perfect place to share these with the 
community, and it would also be appropriate in a public domain testing suite.

Move to close this issue with NoChange.

"P.S.  Perhaps for 1.4 we can look at providing facilities for users (library 
implementers in particular) to perform address arithmetic? "

Directly exposing such a facility via a library sounds like a very good idea, 
and would probably also alleviate much of the headache for the library writers 
you mention. Splitting this into a new issue 93.

Original comment by danbonachea on 5 Oct 2012 at 7:45

GoogleCodeExporter commented 9 years ago
On the 10/5 telecon it was agreed this issue is resolved with NoChange.

Interested parties are encouraged to contribute text to the Rationale document 
Wiki: 
http://code.google.com/p/upc-specification/wiki/UPCSpecCompanion

Original comment by danbonachea on 7 Oct 2012 at 6:23