compiler optimization tracker

JeffBezanson commented 11 years ago

This is an umbrella issue for compiler and other low-level optimizations. I'm including ones that are already done, to make it more interesting.

compiler:

[x] static method lookup
[x] basic inlining
[x] null check elimination
[x] function type check elimination
[x] method lookup hoisting
[x] function-valued argument inlining (#13412)
[x] variable type specialization
[x] varargs allocation elision
[x] lambda lifting
[x] native calling convention
[x] hoist n-d array metadata loads
[x] hoist 1-d array metadata loads (partial: #5355, more via LLVM TBAA)
[x] hoist access to fields in immutable values (partial: #8867)
[x] skip gc frame when possible
[x] avoid re-boxing non-assigned variables
[x] unboxed struct layout
[x] apply tuple lowering: apply(f, t::(T,S)) => f(t[1],t[2])
[x] apply tuple elision: apply(f, (x,y)) => f(x,y)
[x] temporary tuple elision
[x] create and return tuples as structs (#1976, #2496)
[x] early freeing by dataflow analysis (#261)
[x] don't pass or store 0-field structs
[x] rerun type inference after inlining in some cases
[x] handle more constant conditions in inference (partial: #15785)
[x] remove excess temp vars
[x] pool environments of inner functions
[x] inline builtins
[x] better inlining of apply()
[x] inline invoke() (#9608)
[x] inline isdefined() (https://github.com/JuliaLang/julia/pull/19334)
[x] more general inlining
[x] constant-fold type applications
[x] direct call methods specialized for non-leaf types
[x] closure environment specialization
[ ] static keyword argument sorting (#16580)
[x] compute first field of a new object before allocating, to avoid the unnecessary NULL store to the first slot
[x] identify types (especially immutable) that are always fully initialized, to avoid undefined checks on their fields (#8827)
[x] skip gc root for variables with stable values (e.g. vars that only alias arguments)
[x] avoid redundant loads of single-assigned variables from gc frame
[ ] possibly hoist boxing out of tight loops (might interfere with gc though?)
[x] faster possibly-undefined variables (#6914)
[ ] constant-propagation and CSE of pure functions (#14324)

RTS:

[x] compress ASTs
[x] method hash tables
[x] more efficient method and cache representation
[x] gc: 1-bit refcount (#261)
[x] flatten arrays of tuples
[x] use realloc in growing arrays (complicated by alignment constraints)
[x] avoid extra buffer copy in uv_write
[x] allow scheduler to be a function call, avoiding double task switch to get to next task
[ ] avoid some stack copies in Task; e.g. for tasks that never yield

larger projects:

[ ] bounds check elimination
[x] cache generated code (#260)
[x] inlining function arguments
[ ] henchmen unrolling
[x] better type info for tasks/produce/consume
[x] SIMD support (#2299)

performance-related features:

[x] inbounds macro (#3268, #1392)
[x] bounds check intrinsic
[x] sizeof intrinsic
[x] expose llvm select (inlined, eager-evaluated conditional function)
[x] inline declaration (#1106)
[x] pure function declaration (#414)
[ ] improve behavior of globals (#964)
[x] better support for in-place ops (#3424, #249, #1115, #16285)

ViralBShah commented 11 years ago

This is a really nice and useful tracker! We should also link these to our release notes.

timholy commented 11 years ago

It's not obvious to me that you gave yourself credit for your awesome recent work in making let faster :-). Presumably it's bundled into something else?

Thanks for this list. Mildly embarrassed by how many of the issues were filed by me, without me being able to do much to fix most of them.

timholy commented 11 years ago

...but here I go again...

In addition to "flatten arrays of tuples" is it at all feasible to "flatten arrays of types with fixed size"? See https://groups.google.com/d/msg/julia-dev/QOZfPkdRQwk/O-DgzNxbQegJ.

StefanKarpinski commented 11 years ago

We did ask for "ungracious demands" from the outset ;-)

lindahua commented 11 years ago

Great stuff! I believe this will take the performance of Julia to a new height when done.

blakejohnson commented 11 years ago

What about the kind of constant folding Stefan mentions in #2741 ?

IainNZ commented 11 years ago

Thinking out loud: how hard would it be to set up something like http://speed.pypy.org/ ? On the server it seems clear what is required, but do we have a good body of performance tests? If this is something desirable, I'm sure its something the community could pull together without any demands on core developers...

staticfloat commented 11 years ago

@IainNZ; I'm working on that right now. :)

We have a small corpus of performance tests in test/perf2 but I do believe we will need some work to come up with small, self-contained and relevant performance tests. I will open an issue regarding this in the near future, once I have more concrete work done.

JeffBezanson commented 11 years ago

related: 620141b683c32965b74fb759445a61decb2c2dcf

malmaud commented 11 years ago

I'd be interested in helping out on one of these issues, but I don't want to step on anyone's toes or duplicate effort. Can I get some guidance on which of these issues is in need of manpower?

staticfloat commented 10 years ago

@JeffBezanson Is "bounds check elimination" covered by @inbounds?

JeffBezanson commented 10 years ago

No; ideally there should be some automatic bounds check removal.

johnmyleswhite commented 10 years ago

What level of automaticity do you have in mind? I'd love it if bounds-checking were automatically turned off for loops that look like:

for i in 1:length(x)
    x[i] += 1
end

JeffBezanson commented 10 years ago

Yes, that's the kind of case that can be handled automatically.

ArchRobison commented 10 years ago

By the way, the TBAA support in #5355 partially fixes the ``hoist 1-d array metadata loads'' item. It's partial because bounds checks will stop the hoisting because LLVM can't prove it's legal and profitable to hoist a load up over a branch.

mlubin commented 10 years ago

Does one of these items include inlining tuple fields in types? What's the status of this?

JeffBezanson commented 10 years ago

The status is that I'm thinking very hard about it.

One issue is that the system heavily uses tuples as a convenient way to represent "just a few pointers". Specifically we need to redesign the representations of DataType and UnionType not to bootstrap off tuples in this way.

ArchRobison commented 10 years ago

@JeffBezanson If I understand the type system correctly, tuples are similar to anonymous immutable types with structural equivalence and covariance. Are you saying that the system uses don't depend on those features? If so, is it a matter of replacing them with some kind of "pseudo-tuple" type that lacks structural equivalence and covariance?

JeffBezanson commented 10 years ago

Yes that could work, but a lot of these things tend to be exposed for reflection, so it's hard to have pseudo-anything. I think some will stay tuples, some will become tuple types, and some should be flattened into their containing structs. Maybe some can be arrays.

simonster commented 9 years ago

Do we actually have henchmen unrolling?

ViralBShah commented 9 years ago

I was under the impression that it was not yet implemented.

simonster commented 9 years ago

I didn't think so either, but it was checked. I have unchecked.

yuyichao commented 9 years ago

@staticfloat Is there any update on having a performance test suite and sth similar to speed.pypy.org. I found it very useful when working on optimizations (in addition to running the current tests to make sure I'm not breaking anything) to have somthing like this to measure the performance gain (or regression in some cases). Would be awesome to have github integration but sth I can run locally is also good enough.

JeffBezanson commented 9 years ago

Running make in test/perf will run our benchmark suite.

ghost commented 9 years ago

@yuyichao: There is also the Julia Speed Center.

Edit: It appears to not have any new updates since June 2014 though, which was around when I last had a look at it.

ViralBShah commented 9 years ago

@staticfloat can shed some light on the history there.

staticfloat commented 9 years ago

Yes, I think @yuyichao was hinting at that when he pinged me. Essentially, I just haven't had time to rewrite speed.julialang.org in something more flexible than the original adaptation from the codespeed project. It's the next big Julia project I will tackle, but unfortunately time for big projects is thin for me at the moment.

ViralBShah commented 9 years ago

Is it fair to tick off SIMD types on this list with tuple overhaul?

ihnorton commented 9 years ago

I think ensuring proper alignment for SIMD is still TODO. On Apr 23, 2015 9:12 AM, "Viral B. Shah" notifications@github.com wrote:

Is it fair to tick off SIMD types on this list with tuple overhaul?

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/3440#issuecomment-95578830.

ArchRobison commented 9 years ago

My impression is that recent Intel SIMD hardware is much less performance picky about alignment than it used to be, but I have not run experiments.

simonster commented 9 years ago

One bit that is (I think?) still missing is the ability to do vectorized arithmetic on tuples (or something that wraps them) without llvmcall.

ArchRobison commented 9 years ago

The optional (under -O) SLPVectorizer pass used to vectorize tuple arithmetic without any special wrapper. But it is failing to do so.

pao commented 9 years ago

My impression is that recent Intel SIMD hardware is much less performance picky about alignment than it used to be, but I have not run experiments.

And getting good performance out of less-recent hardware shouldn't be ignored if it's reasonable to get.

ScottPJones commented 9 years ago

OK, this is great (thanks for the reference that led me here @timholy) Which boxes that are not checked are already largely in place? Thanks.

timholy commented 9 years ago

I certainly am not qualified to give a whole list, but "more general inlining" and "inline declararation" are presumably largely done (there are still ambitions to introduce inlining as controlled from the call site), and there's already some bounds check elimination and SIMD support.

vtjnash commented 8 years ago

checked off: better type info for tasks/produce/consume (Channels) SIMD support (#2299) more general inlining inline declaration (#1106) pure function declaration (#414)

since the basic version of each of these has been implemented

eschnett commented 8 years ago

You may want to add @fastmath code generation.

vtjnash commented 7 years ago

Checked off a few more. We now have all of these, or they are tracked in another existing issues.

JuliaLang / julia

compiler optimization tracker #3440