Open GoogleCodeExporter opened 9 years ago
Regarding the potential perforance benefits of dropping the layout qualifier
(block size specifier), some implementations (BUPC, for example) use two
differing internal representations for pointers-to-shared. When the block size
is 0 or 1, the phase is known to be zero, therefore no phase field is
allocated. The larger internal representation with the phase field is reserved
for pointers-to-shared that have a block size > 1.
The GUPC compiler always allocates the space for the phase field, but will not
use the phase value for shared types with block size <= 1. Although there is
some storage overhead and silght inefficiency due to ensuring that the stored
phase value is zero, there is no additional computational overhead for shared
pointer arithmetic involving types with block size <= 1.
Based on the above observerations, although there are some definite language
simplifications derived from removing block sizes > 1, there need not be a
storage efficiency of performance impact for block sizes <= 1.
Original comment by gary.funck
on 19 Mar 2012 at 7:56
Cray also always allocates space for the phase, but does not consider the phase
in any code generated for block sizes <= 1.
When I wrote "implementation benefits," I was not really speaking about
performance, but rather the complexity of the compiler or run-time code that
handles the pointer-to-shared arithmetic. The code is rather simple for block
sizes <= 1, but gets to be tricky for block sizes > 1, especially when one must
factor in unknown signs (at compile time) and the UPC division and mod
operations that don't match the standard C ops. Compiler-generated code for
ptr + n where n has an unknown sign and ptr has a block size > 1 is ugly; the
code inside a compiler to implement it isn't much fun either. I suppose some
implementations may push off the work to a run-time call, but using a function
call for something as "simple" as pointer addition really feels wrong to me.
The Cray compiler generates inline code for pointer-to-shared addition and
we're interested in keeping it simple.
Original comment by johnson....@gmail.com
on 20 Mar 2012 at 3:03
If anything is done in our current round of spec changes with respect to
blocksize >1, then I would say "deprecate" is the strongest action we can take.
To remove blocksize >1 completely would break too many applications.
The idea even of deprecating them bothers me significantly.
As an implementer I can agree w/ Troy's desire to keep the PTS arithmetic code
as simple as possible. However, the proposed alternative seems to be structs
or user-provided-arithmetic (via macros perhaps). Not to be insulting to Troy
or to our user base, but the idea that we are going to get higher
performance/quality pointer arithmetic from a UPC end-user than from the UPC
compiler seems ridiculous to me.
So, I vote to "Allow".
Original comment by phhargr...@lbl.gov
on 22 May 2012 at 12:29
I can see both sides of the issue, but I would still like to see blocking
factors gone. I have two major arguments for simplification, and a potential
way to deal with Paul's argument.
Arguments for restricting blocking factors
=========================================
(1) Language clarity benefit. Maybe you don't appreciate how much simpler UPC
would become:
* Cleaner syntax, obviously. Well, maybe except for [0].
* No more trouble with [*] blocking factor, thread-dependent blocking factors,
maximum blocking factor and so on. The UPC type system compresses to something
essentially C's own type system.
* The concept of "phase" disappears from language, including upc_phaseof
* All the funky special cases in the collective definitions. gone.
* Type casts become simpler to behold. The old rule of "phase shall be zero
after cast" can go. No more trouble with actual to formal parameter
translations in function calls. No more trouble with writing functions that
hard-code the blocking factor.
(2) Implementation benefits. What Troy said :) In addition [pure selfish
thought], on the PowerPC architecture, getting rid of a modulo/integer division
pair is no mean feat.
How to deal with the backwards compatibility issue
==================================================
Paul rightly feels that the suggested change is drastic and will result in at
least some code that will not work anymore. Oh, and he don't like deprecation
either. Darn.
So how about a source-to-source translator that transforms fixed blocking
factor code into BF==1 code? Gary's original message has almost the complete
blue print for the transformation.
For codes with array indices the transformation would be fairly trivial. For
codes with pointers-to-shared the transformation would have to generate a
"pointer increment" function to allow pointer arithmetic to happen according to
the original program's notions. This pointer increment function would then be
inlined, essentially re-adding the complexity that Troy saved by simplifying
the runtime. Thus, the runtime would be clean and high performance, but if the
programmer wants to keep their hairy old code they can do that at a cost.
The source-to-source translator could transparently deal with casts to local,
since the actual layout of data in memory would not have changed - only the
indexing functions using pointers-to-shared would have been modified.
Original comment by ga10...@gmail.com
on 24 May 2012 at 3:29
This seems the appropriate place to make this point: I think we are missing the
real issue here.
The way UPC handles distributed arrays is awkward at best. This proposal and
several other proposals are tinkering around the edges, rather than proposing
any sort of wholesale change that actually improves the expressibility of the
language. Many of the proposals are aimed at eliminating some implementation
challenges, or perhaps adding restrictions to eliminate confusing cases. I
agree with Paul that these do not seem to be of significant benefit,
particularly to users of the language.
Perhaps, rather thank fiddling with block size related changes, people could
propose new ways of specifying array geometry, perhaps that build on cyclic and
indefinite pointer arithmetic, perhaps not. That might be of more benefit to
users than removing existing functionality.
For the question here, I'll vote "Allow".
Original comment by brian.wibecan
on 25 May 2012 at 9:46
Brian wrote:
> I agree with Paul that these do not seem to be of significant benefit,
particularly to users of the language.
I would go so far as the say that dropping the current distributed array
layouts would be creating a NEW language. What would your response be if I
asked that arrays be removed from C entirely, since users can achieve the same
things using only pointers? It is not a perfect analogy, of course, but my
point is that distributed array layouts in UPC are too fundamental feature of
the language to remove them.
I second Brian's interest in perhaps ADDING mechanism for better
controlling/using array layouts.
Original comment by phhargr...@lbl.gov
on 25 May 2012 at 10:11
I also vote for maintaining the status quo, with respect to block sizes > 1. I
also support Brian's suggestion that an "out of box" proposal that supersedes
and generalizes layout qualifiers might overcome the limitations of block sizes
(whatever they may be) might be a more productive avenue of inquiry.
Couple of things in this regard:
1) I have heard comments to the effect that: "if you're developing a library,
you can't use block sizes other that 0, because block sizes are constrained to
be compile-time constants". To counter that objection, perhaps issue #40
(block sizes as an attribute of a VLA") would provide sufficient generality to
meet that objection.
2) Although there has been a rather persistent stated concern that block sizes
> 1 are both confusing and not very useful, apart from stated compiler/runtime
implementation issues, and the library development limitation mentioned in 1
above, I am not aware of any further elucidation of why eliminating block sizes
> 1 would be a good thing. If there are other than implementation issues
related to block sizes > 1, I'd suggest that they should be added to this
issues as comments, so that we can better understand the problem.
3) Given that Co-array FORTRAN programs also provide distributed arrays, is
there anything about UPC's block sizes (layout qualifiers) that either improves
the fit between UPC and Co-array Fortran, or limits inter-operability (this
topic might be worth a separate issue to track the discussion).
disclaimer: I happen to like UPC array blocking factors, and think they should
be used more, not less. If there are UPC language issues that limit their use,
I'd prefer to see those limitations addressed rather than throwing out block
sizes. That said, there is also a great deal of appeal to the minimalist
argument of simplifying the language where possible/practical.
Original comment by gary.funck
on 25 May 2012 at 10:45
Regarding Comment #7:
3) Fortran has fewer (basically one) data distribution options than UPC, so
mapping a distribution from Fortran->UPC is easier than UPC->Fortran. Fewer
options can be viewed as a weakness or a strength. I believe that it is a
strength, partially because I think too many options are confusing and
partially because of what Brian wrote in Comment #5 (i.e., the distribution
options are fine but their presentation could be improved).
Original comment by johnson....@gmail.com
on 5 Jun 2012 at 8:30
Marking 2.0 and Usability and change title to better reflect the issues being
discussed. This is an issue where everyone seems to agree that something
should be done but we need more time to form better proposals.
Original comment by johnson....@gmail.com
on 15 Jun 2012 at 6:09
Original issue reported on code.google.com by
johnson....@gmail.com
on 19 Mar 2012 at 4:28