Closed GoogleCodeExporter closed 9 years ago
Issue 44 has been merged into this issue.
Original comment by yzh...@lbl.gov
on 22 May 2012 at 11:57
I'm game. I don't think it's overcomplicated, but it takes a while to define
the terms. Also, it's murder on architectures where integer divisions are not
pipelined (grrr).
Consider a shared array "A" declared with blocking factor "BF" and element size
"ES". BF=0 is a trivial special case that I'm not going to bother with. ES=0 is
an error.
The array is allocated on threads T = { 0..THREADS-1 } at memory addresses
base[t], t \in T.
Given index "i" we want to find out
* The UPC thread "thrd" that A[i] is affine to
* The memory address "addr" on T that A[i] is located at on thread "thrd"
Step 1: we calculate the "block index" and "block offset" of index i. Consider
"A" laid out as a contiguous set of blocks - I want to know which block "i" is
in, and where in the block we are.
blockIDX = i / BFES; /* "block index" */
blockOFF = i % BFES; /* byte offset in block */
Step 2: Given the block index we can calculate "thrd" and "addr" directly:
thrd = blockIDX % THREADS;
addr = base[thrd] + blockIDX / THREADS + blockOFF;
Original comment by ga10...@gmail.com
on 24 May 2012 at 11:33
I'm afraid that it isn't that simple. The array case is not bad because array
subscripts are generally assumed to be non-negative, but the same is not true
for pointer addition. You can have p + i where p is a pointer-to-shared and i
has an unknown value. For the case where i < 0, the C division and modulus
operators differ from the ones that the UPC spec uses.
Original comment by johnson....@gmail.com
on 24 May 2012 at 3:57
Troy, good point. Pointers are indeed a bit more difficult, but only because -
I predict - we will not completely agree on the representation of a
pointer-to-shared. too. Here is an algorithm for a generic pointer increment,
starting with a generic representation for a pointer-to-shared. Please comment.
Pointer representation
======================
Let us consider the portable part of the pointer be a pair { thrd, offset }
where "offset" is the offset in bytes from the base of the array. We consider
the offset to be always positive - if the offset points to before the beginning
of the array it's an access violation.
Pointer increment: definition
=============================
I start with a pointer-to-shared "ptr" and I want to increment this pointer by
some value "incr". I interpret "ptr" as pointing to an array of type { BF, ES}.
Algorithm
=========
Step 1: determine the "linear index" of the pointer in bytes (this is an
imaginary offset in an imaginary array laid out across all processors).
Essentially: WHERE AM I IN THE SHARED ARRAY?
blkOFF = ptr.offset % BFES;
blkIDX = ptr.offset / BFES;
upcIDX = blkOFF + (blkIDX*THREADS + ptr.thread)*BFES;
Step 2: increment the linear index. This is a simple increment of the upcIDX.
upcIDX += ES * incr;
Step 3: re-transform the linear index back into a pointer. This code should be
familiar from my previous posting.
blkOFF = upc_idx % BFES;
blkIDX = upc_idx / BFES;
ptr.thread = blkIDX % THREADS;
blkIDX = blkIDX / THREADS;
ptr.offset = BFES * blkIDX + blkOFF;
Original comment by ga10...@gmail.com
on 24 May 2012 at 5:35
[deleted comment]
If you take "ptr.offset" to refer to the language-required "ptr.addrfield",
thus making the imaginary array span the entirety of what might be considered
shared space, I believe this is correct.
Original comment by brian.wibecan
on 24 May 2012 at 6:08
We saw this definitional aspect of '%' crop up on a test failure in the a GUTS
'main' test: iteration4. It may be worth filing an issue on this, but I could
use some help in characterizing the issue.
Until we fixed the compiler, the test was failing on this test, when compiled
in a static THREADS compilation environment.
This test fails if compiled in static env for threads greater then 16, and only
for negative indexes. Here is the simplest failing code:
#define N 10
upc_forall (i=-N; i<N; i++; i) {
if (i >= 0)
check(i % THREADS == MYTHREAD);
else
check(((i % THREADS)+THREADS)%THREADS == MYTHREAD);
}
You'll note that the index, 'i' ranges from -N .. (N-1). You'll also note that
the test doesn't make any shared references at all, so the question of
array/pointer bounds doesn't come into play.
The bug turned out to be due to the fact that the compiler used an unsigned
type when performing the '%' operation, and further, this unsigned '%' was
optimized to be performed as a multiply by the reciprical and that gave
incorrect values when i is negative.
This reference discusses the defined behavior of %.
http://bytes.com/topic/c/answers/444522-modulus-negative-operands [^]
The gist is that in C89 this was implementation-defined. In C99 they defined
'/' as always rounding towards 0, which is conventional and compatible with
Fortran.
If 'affinity' is determined in terms of '%', but '%' returns a negative number,
how can upc_forall() determine affinity?
(For the moment, we won't worry about whether affinity is defined on negative
index values.)
The test masks over this issue by doing something different for negative values
of 'i'.
if (i >= 0)
check(i % THREADS == MYTHREAD);
else
check(((i % THREADS)+THREADS)%THREADS == MYTHREAD);
In any event, we "fixed" this test failure by insuring that the division was
done using signed arithmetic.
BTW, philosophically, I agree with the interpretation that p[-1] for some value
of PTS 'p' should be (and likely already is in C99?) as an array out-of-bounds
error?
Original comment by gary.funck
on 24 May 2012 at 6:34
Brian, good point about addrfield. IIRC the language spec makes no
representation about comparing the addrfields of two different shared arrays,
so I think I might be safe even if I consider just a single shared array at a
time, and consider addrfield to be zero where the base of the array lives.
Brrr, I don't want to get into this business of the modulus of a negative
number. Remainders are positive by definition :)
Original comment by ga10...@gmail.com
on 24 May 2012 at 7:35
My point about using addrfield was to move away from an implementation-specific
concept: the base address of the object into which a pointer points, which is
not required to be tracked by the implementation. Pointer arithmetic is well
defined with reference to addrfield. How addrfield relates to the base of the
shared object is up to the implementation. Addrfield doesn't even have to be
used as part of the pointer representation; it can be fabricated when
requested, as I believe some of the implementations do, so long as it exhibits
the required behavior.
I don't recall what promises the language makes regarding pointer operations
for pointers that don't point within the same object, but I'm sure they can be
compared for equality or tested to see if they are NULL. Testing for equality
I believe is defined as matching addrfield and thread values. So, pointers
into two different objects must have different addrfield base values, even if
the value used in real pointer arithmetic is the offset from the array
beginning and thus the same for both.
Original comment by brian.wibecan
on 24 May 2012 at 9:30
Brian wrote:
> I don't recall what promises the language makes regarding pointer operations
for
> pointers that don't point within the same object, but I'm sure they can be
compared
> for equality or tested to see if they are NULL. Testing for equality I
believe is
> defined as matching addrfield and thread values. So, pointers into two
different
> objects must have different addrfield base values, even if the value used in
real
> pointer arithmetic is the offset from the array beginning and thus the same
for
> both.
I believe Brian is right: tests for equality (including against NULL) are the
only things C99 allows for pointers to distinct objects, and so UPC is the same
unless we added extra rules.
I believe this allows a pointer representation in C99 that consists of, for
instance, a segment number and an offset in that segment. I UPC, that would
translate (puns intended) to a representation in which the addrfield (or the
portion of the internal representation from which it is derived on demand)
contains a tuple such as (block, offset): a "good" representation in a system
which dynamically grows the shared heap in chunks that might not be virtually
contiguous.
Regarding arithmetic with negative values, we do need to be able to get correct
values for the following expressions where sign of the result (first case) or
of a operand (second case) is not known at compile time:
j = &a[i] - p; // where p points within a[], either before OR after a[i]
p += i; // where i can have either sign, but is still within a single array
I am guessing these are the sort of case that had Troy particularly concerned
about the complexity of the code that the compiler must currently emit.
Original comment by phhargr...@lbl.gov
on 25 May 2012 at 7:03
Paul, thanks for the clarification. Although I am still sympathetic to p[-1]
being considered out-of-range, I can see now how that such a reference can
still be in bounds, so please dis-regard that comment.
Question: _if_ block sizes were eliminated would it eliminate the need for
special handling of signed indices vis-a-vis the language defined '%' operator
semantics?
Original comment by gary.funck
on 25 May 2012 at 8:19
If block sizes were eliminated, things are a lot simpler, but care still needs
to be taken. The integer division operation is defined to be consistent with
the modulus operation, such that
Q = A / B
R = A % B
A = B * Q + R
This relationship is true for the mathematical versions of the operations
(remainder always positive) as well. Thus, if one of the operations (division
or remainder) is different between the two versions (C or math) for negative
values, so is the other one. Eliminating block size removes the need for the
use of the remainder operation, but calculating a new thread value still
requires use of the division operation. It is possible it can be worked
around, but that's another way of saying "special handling".
Original comment by brian.wibecan
on 25 May 2012 at 9:06
Gary Asked:
> Question: _if_ block sizes were eliminated would it eliminate the need for
special
> handling of signed indices vis-a-vis the language defined '%' operator
semantics?
Brian Answered (in part):
> If block sizes were eliminated, things are a lot simpler, but care still
needs to
> be taken.
[...]
> Eliminating block size removes the need for the use of the remainder
operation,
> but calculating a new thread value still requires use of the division
operation.
Perhaps I am missing something here, but the "back of my envelope" disagrees
with Brian's envelope. I agree things get simpler if only 0 and 1 are legal
block sizes. Obviously, the blocksize=0 case is just C pointer arithmetic.
However, the in the cyclic case I believe it is the REMAINDER operator (not
DIVISION as Brian states) that is needed to determine the new thread value.
(Perhaps Brian just meant we perform the REMAINDER via
DIVIDE+MULTIPLY+SUBTRACT, but I doubt it).
Depending on the PTS representation, it MIGHT (I've not tried to test this
hypothesis) to perform all (pointer +/- integer) arithmetic as offsets relative
to an "arbitrary" base address (the base of SOME object - could be "array",
"heap" or "chunk of heap") which would ensure that all LEGAL (defined as all
pointer input and output lie within the same object) operations involve only
non-negative operands to REMAINDER. If that is so, then I believe that
elimination of block sizes could allow many/most implementations to use the C
'%' operator without any branches to check signs. In a static-threads
environment with power-of-2 values of both THREADS and the element size, that
means one gets to optimize all the way to bitwise-AND:
new_threadof = linear_offset & (THREADS * sizeof(element) - 1);
Off course computing the linear_offset might be non-trivial but is hopefully
cheaper than integer division.
Original comment by phhargr...@lbl.gov
on 25 May 2012 at 10:07
With my previous comment in mind, I think I am on the track of definition of
shared pointer arithmetic to replace the one in paragraph 3 of 6.4.2 (the one
using the "abnormal" div and mod operators) with one based on the C '/' and '%'
operators. The resulting definition will be LONGER, and therefore probably
LESS amenable to implementation than one may wish for. However, this issue
only really requested such a definition "for self-consistency".
I still need some work to prove the correctness to my self, but the idea is to
replace "i" in the current definition with a properly chosen non-negative "j"
such that C99's "%" can be applied in the phase computation. The thread
computation then needs EITHER a "correction" factor, or a "k" computed from "i"
and distinct from "j". The belief is that I can construct something with "%"
and "/" without any branches. We'll see if I reach that or not...
Original comment by phhargr...@lbl.gov
on 25 May 2012 at 11:56
First of all, my apologies to Gheorghe for attempting in comment #13 to
reinvent the algorithm he had already described quite well in comment #2 and
comment #4.
Anyway, here is what I've came up with:
r = B * THREADS; // Row length
j = (i>=0) ? i : (i + (1 - i/r)*r); // Non-negative replacement for i
ph_out = (ph + j) % B;
th_out = (th + (ph + j) / B) % THREADS;
As hoped the new definitions replace "div" and "mod" with "/" and "%" simply by
replacing "i" with a properly chosen "j". These definitions hold for *any*
non-negative "j" which differs from "i" by an offset which is a multiple of
both B and of THREADS (thus vanishing in the "%" operations). If one can offer
a cheaper computation of a suitable "j", then it can be substituted.
I will note once again that this "trick" is intended for revising the spec
definition of how phase and thread are computed to read in terms of the
standard C division and remainder operations. The actual computation of the
correct addrfield (or the internal value from which is is computed) in an
implementation would potentially become MORE complicated using this formulation
(brain hurts too much at this point to follow through that far).
Somebody (or EVERYBODY if possible) *PLEASE* double check that my math is right.
If so, then...
Proposal:
Replace paragraph 3 of section 6.4.2 and its footnote[8] with:
3 After this assignment the following equations must hold in any UPC
implementation. In each case the '/' operator rounds toward zero and the
'%' operator returns the non-negative remainder, as required by C[8]:
upc_phaseof(p1) == (upc_phaseof(p) + j) % B
upc_threadof(p1) == (upc_threadof(p) + (upc_phaseof(p) + j) / B) % THREADS
where
size_t r = B * THREADS;
size_t j = i + ( (i >= 0) ? 0 : r * (1 + (-i)/r) );
[8] The use of the non-negative value "j" in the equations allows the division
and remainder operations defined by C to be used, rather than alternative
versions
based on division which rounds toward negative infinity.
Original comment by phhargr...@lbl.gov
on 26 May 2012 at 2:02
We just hit this little problem in a test case. It is related to the earlier
discussion on the bevavior of the % operator on negative values.
Would you consider the following access legal?
shared [BF] int A[...];
shared [BF] int * p = A-1;
shared [BF] int * q = p+1;
...
*q ...
...
Note that p points to an illegal location, but is not dereferenced.
q points to A[0], and *is* dereferenced.
1) Do you consider this correct code?
2) Does it pass your particular favorite flavor of UPC?
-- George
Original comment by ga10...@gmail.com
on 11 Jun 2012 at 9:03
George asks
> 1) Do you consider this correct code?
> 2) Does it pass your particular favorite flavor of UPC?
1) I am not sure if I believe this is legal.
I would be inclined to think it would NOT be legal in C99 for a normal array,
but would work with most compilers. I am skimming the C99 spec now...
2) With Berkeley UPC this code "works", getting q==&A[0].
HOWEVER, if "1" were replaced by something like "1000000" then in a "-g"
compilation we'd notice the computed value of "p" was outside the shared heap
and report it as an error.
Caveat: with optimization enabled "p" and its intermediate value might be
discarded entirely.
Original comment by phhargr...@lbl.gov
on 11 Jun 2012 at 9:41
OK, found it. Short answer: UNDEFINED BEHAVIOR.
Long version.
C99 6.5.6 (Additive Operators) in Semantic 8 says, in part:
"If both the pointer operand and the result point to elements of the same array
object, or one past the last element of the array object, the evaluation shall
not produce an overflow; otherwise, the behavior is undefined. If the result
points one past the last element of the array object, it shall not be used as
the operand of a unary * operator that is evaluated."
So, one CAN portability perform pointer arithmetic that would place "p" at most
one-past-last (but not dereference it), but NOT one-before-first.
Of course George's compiler still needs to generate proper code for
shared [BF] int * p = &A[1] - 1;
Original comment by phhargr...@lbl.gov
on 11 Jun 2012 at 9:55
An interesting variation on the test case follows.
#include <upc.h>
#include <assert.h>
shared [3] int A[30*THREADS];
shared [3] int *p = A-1;
int main(void)
{
shared [3] int *q = p + 1;
if (!MYTHREAD)
A[0] = 100;
assert (A[0] == *q);
}
Here 'p' is file scoped and is initialized "statically". In the GUPC
implementation it is not initialized statically in the usual sense of the term.
Initialization code is generated which is run before main is called.
Original comment by gary.funck
on 11 Jun 2012 at 9:55
Yes, a barrier is needed in the example.
#include <upc.h>
#include <assert.h>
shared [3] int A[30*THREADS];
shared [3] int *p = A-1;
int main(void)
{
shared [3] int *q = p + 1;
if (!MYTHREAD)
A[0] = 100;
upc_barrier;
assert (A[0] == *q);
}
Original comment by gary.funck
on 11 Jun 2012 at 9:59
> I will note once again that this "trick" is intended for revising the spec
definition of how
> phase and thread are computed to read in terms of the standard C division and
remainder operations.
> The actual computation of the correct addrfield (or the internal value from
which is is computed)
> in an implementation would potentially become MORE complicated using this
formulation (brain hurts
> too much at this point to follow through that far).
I may be reading too much into this comment, but I'm concerned that there is an
underlying assumption that the existing formula in the spec is *always*
directly implementable without going through this exercise (to figure out how
to put it in terms of the C operators). There may not be a hardware
instruction that performs the division and modulus operations with the desired
UPC rounding, so this whole exercise is necessary for such targets in order to
use the hardware's integer division and modulus operations to emulate the
desired UPC rounding. x86 is one such architecture.
Original comment by johnson....@gmail.com
on 12 Jun 2012 at 3:00
Troy wrote:
> I may be reading too much into this comment, but I'm concerned that there is
an
> underlying assumption that the existing formula in the spec is *always*
directly
> implementable without going through this exercise (to figure out how to put
it in
> terms of the C operators).
OK, lets talk in terms of IMPLEMENTATION formula...
Ours is all in BSD-licensed source. So, I have no secrets.
In Berkeley UPC we do source-to-source translation use a (forced-)inline
function for pointer arithmetic (pointer +/- ptrdiff_t), which contains a
branch on the sign of the integer. We expect the common case to be adding of
constants, and thus that the back-end compiler will discard one branch.
On the negative branch we use a "standard mathematics transformation" (see
below). So, if one is happy with a "if non-negative use eq1 else use eq2" in
the spec, then we can take that route:
For non negative 'i':
ph_out = (ph + i) % B;
th_out = (th + (ph + i) / B) % THREADS;
for negative 'i' we "shift" numerators before divmod and must "unshift" the mod
result:
let ph_tmp = ph - (B - 1);
let th_tmp = th - (THREADS - 1);
ph_out = (ph_tmp + i) % B + (B - 1);
th_out = (th_tmp + (ph_tmp + i) / B) % THREADS + (THREADS - 1);
This is the math that forms the "guts" of the current BUPC implementation, with
the exception of the addrfield computation. That uses the "/ THREADS" that
matches with the "% THREADS" used to compute 'th_out'. [The actual C code is
expressed with more temporaries for CSE (nearly SSA form) and groups the
related '/' and '%' together to help the back-end compiler pair them up.]
Or, if one prefers, the following combined form is possible:
let a = (i < 0) ? 1 : 0;
let ph_tmp = ph - a * (B - 1);
let th_tmp = th - a * (THREADS - 1);
ph_out = (ph_tmp + i) % B + a * (B - 1);
th_out = (th_tmp + (ph_tmp + i) / B) % THREADS + a * (THREADS - 1);
where 'a' is 0 or 1 in order to "fold" the positive and negative formula
together.
In using this case for code generation, one hopes to decide 'a' at compile time
and expects the optimizer to either discard the zero terms in the
non-negative-i case, or to discard multiply-1 in the negative case. However,
we've found the formulation above which branches instead of using 'a' to
optimize more consistently.
Are either of those more to the groups's liking for use in the spec?
BTW:
As a result of the pencil-and-paper math I've done for this tracker issue, our
next release will use these formula only when 'i' is NOT among a set of special
cases. If 'i' is 1, -1 or an integer multiple of B the math will be done with
one less div/mod (where a "bounds check" comparison is used for 1 and -1).
These are "picked off" with careful use of __builtin_contant_p(). For static
threads we also pick off cases that are THREADS times the previous cases and
remove an additional '%' operation. We've also special-cased i==0.
The [] and [1] layout qualifiers are handled by different code entirely, which
omits phase, just as I have heard others say they compilers do.
Original comment by phhargr...@lbl.gov
on 12 Jun 2012 at 7:19
To double down on Paul's comments - the code I posted earlier is literally
stolen from the PGAS runtime - so no secrets folks, this is *exactly* how we do
pointer arithmetic.
I would like, however, do the right thing and handle pointer arithmetic with
negative array indices - as long as none of those negative indices get actually
dereferenced. Let's face it, A[p-1+1] happens too often - flagging it as an
error at runtime would be somewhat acceptable, executing it "correctly" is
preferred, but letting it cause unexplained segmentation violations is not
something I want to live with as a UPC programmer.
I will attempt to evaluate Paul's proposed formula to see how it stands up to a
bit of battering with negative indices. If it works, we should enshrine it in
the documentation in some form. "Advice to implementors", maybe?
Original comment by ga10...@gmail.com
on 15 Jun 2012 at 3:09
George wrote:
> I will attempt to evaluate Paul's proposed formula to see how it stands up to
a
> bit of battering with negative indices. If it works, we should enshrine it in
the
> documentation in some form. "Advice to implementors", maybe?
George,
We've been using this code, including tests with negative "increments", for a
while now. I would be very surprised if these formula were to fail your
battering, but if they do then I *REALLY* want to know of the problem.
However, see my note below about signed arithmetic before you start battering.
For completeness, in our implementation the addrfield math is (using the
"combined form" formula with 'a'):
block_incr = (th_tmp + (ph_tmp + i) / B) / THREADS;
elem_incr = (ph_out - ph) + B * block_incr;
addr_out = addr + elem_incr * upc_elemsz(ptr);
Note that if one fails to use signed types for some of the computations, then
these formula might give HORRIBLY incorrect results. I can be sure about
others, but I recall that at least ph_tmp, th_tmp and elem_incr probably need
to be signed. We've used ptrdiff_t for most temporaries.
Original comment by phhargr...@lbl.gov
on 15 Jun 2012 at 11:50
Although the consensus may be heading towards "no change", or "Add this as
discussion in the TBD "implementer's notes", tagging this as spec. 1.3 for now.
Yili, you're the owner of this issue. Please change its status if you prefer
to give it a different disposition.
Original comment by gary.funck
on 2 Jul 2012 at 5:37
Correction: Troy's the owner, Yili the original reporter. Either/both of you
-- please change the issue status, if you would like to give it a different
disposition.
Original comment by gary.funck
on 2 Jul 2012 at 5:40
Gary wrote:
> Although the consensus may be heading towards "no change", or "Add this as
discussion in the TBD
> "implementer's notes", tagging this as spec. 1.3 for now."
I would like to see a change in 1.3 to replace the current definition with
something based on any one of the example formations I offered (or any other,
this isn't about my pride). I believe that as long as we lack run-time
variable block sizes, there will be library implementers who will NEED to
perform their own pointer arithmetic (see MTU's collectives reference
implementation as an example). While the MTU Collectives example is
"borderline", my point is that I believe some USERS will need to perform
arithmetic on pointers-to-shared. By NOT giving the users a "plain C" version
of the definitions we make their task more difficult (and more error prone if
they just type in the current definition).
So, do others agree or disagree that a spec change is appropriate for 1.3?
P.S. Perhaps for 1.4 we can look at providing facilities for users (library
implementers in particular) to perform address arithmetic? They might, for
instance, pass a (shared void *), the block size, the element size, and an
integral increment and return a (shared void *). A PTS subtraction routine
which takes blocksize and elemsize arguments would probably also be required.
Anybody else interested enough is encouraged to open a new tracker issue.
Original comment by phhargr...@lbl.gov
on 2 Jul 2012 at 5:52
Original comment by gary.funck
on 3 Jul 2012 at 3:10
Sorry for the month-and-a-half delay. In reference to comment 24 - we changed
our array index arithmetic to handle negative indices also, and it seems to get
around the problem of p - a + b w/o causing any problems, various blocking
factors and stuff notwithstanding.
Looking back to comment 1 - the original request related to a functional
description of array index arithmetic. Maybe we should also talk about a
functional test set for index arithmetic. Do people on this list believe that
the Berkeley bugzilla test cases and GUTS are sufficient for this? I don't have
a good feeling as to how many corner cases there are out there not tested by
existing functional test cases.
Original comment by ga10...@gmail.com
on 3 Aug 2012 at 6:18
Set default Consensus to "Low".
Original comment by gary.funck
on 19 Aug 2012 at 11:26
Mass change "Accepted" issues which haven't seen activity for over a month back
to "New", for telecon discussion.
Original comment by danbonachea
on 4 Oct 2012 at 11:36
"I would like to see a change in 1.3 to replace the current definition with
something based on any one of the example formations I offered .. library
implementers who will NEED to perform their own pointer arithmetic .. By NOT
giving the users a "plain C" version of the definitions we make their task more
difficult"
Looking over the history of this discussion, nobody seems to be claiming the
current spec definition for PTS arithmetic is ambiguous or incomplete (modulo
issue 85 wrt multi-dim arrays). The formal definition leverages two operators
which are not part of C99 (but are mathematically defined). The "equivalent
formulations" written in pure C99 are significantly more complicated and
therefore harder to read and understand. The purpose of the spec is to provide
a formal and complete behavioral description, not an operational recipe for
implementation, especially when the former is more concise and the latter may
have the side-effect of subtly imposing additional, unnecessary implementation
constraints.
The existence of one or more "reference implementations" of PTS arithmetic in
pure C99 may be helpful to compiler and library writers, but in my opinion that
alone doesn't justify the inclusion of that code in the formal specification.
The rationale document seems like a perfect place to share these with the
community, and it would also be appropriate in a public domain testing suite.
Move to close this issue with NoChange.
"P.S. Perhaps for 1.4 we can look at providing facilities for users (library
implementers in particular) to perform address arithmetic? "
Directly exposing such a facility via a library sounds like a very good idea,
and would probably also alleviate much of the headache for the library writers
you mention. Splitting this into a new issue 93.
Original comment by danbonachea
on 5 Oct 2012 at 7:45
On the 10/5 telecon it was agreed this issue is resolved with NoChange.
Interested parties are encouraged to contribute text to the Rationale document
Wiki:
http://code.google.com/p/upc-specification/wiki/UPCSpecCompanion
Original comment by danbonachea
on 7 Oct 2012 at 6:23
Original issue reported on code.google.com by
yzh...@lbl.gov
on 22 May 2012 at 11:47