diku-dk / futhark

:boom::computer::boom: A data-parallel functional programming language
http://futhark-lang.org
ISC License
2.37k stars 164 forks source link

Refine inlining #1385

Open athas opened 3 years ago

athas commented 3 years ago

Futhark currently inlines everything, all the time (unless you use #[noinline]). Many optimisations are crucially dependent on inlining, so we will always inline a lot, but we currently bloat compile times (and code size) by inlining too much - particularly for large programs. This was less of a problem when Futhark was mostly used for very small programs, but now we and others are writing slightly larger things.

I propose making the inlining transformation more intelligent, so we can make more refined decisions about how and when to inline. For a start, the current inlining pass should be split in two:

  1. A first pass that inlines everything that should always be inlined: functions that are only called once, and functions that are marked as #[inline]. Note that the former category includes the bookkeeping functions added by lambda lifting and defunctionalisation.
  2. A pass that inlines all remaining functions.

The net effect will initially be the same, but we can then start working on (2) to make it more intelligent. I would like both of these passes to be expressed within the same rule-driven framework, probably based on manipulating a call graph.

athas commented 3 years ago

We'll also have to improve the handling of stack traces if we want to inline less.

FluxusMagna commented 3 years ago

Will this also reduce the number of tuning parameters? It would be neat if a function that is never inlined has tuning parameters independent of the calling function so it's only tuned once. I suppose this might be complicated to do globally as tuning is done on an entry point basis as far as I know, but at least not tuning the same function more than once within an entry point would be nice.

athas commented 3 years ago

It will probably shrink the number of tuning parameters somewhat, but I'm not sure what the exact impact will be. We'll still need to completely inline anything that occurs inside a parallel operation.

And you're right that we need to refine the tuning a bit to deal with the fact that tuning parameters are no longer specific to each entry point. I'm pretty confident it can be solved.

FluxusMagna commented 3 years ago

If information about which sizes the parameter has been tuned for are stored, I guess it should be possible to simply skip redundant tuning passes, while tuning for any new sizes that appear. Another tuning related concern I have is that it would save a lot of time if tuning was persistent for unmodified functions across compilations. Could a hash be used to identify unchanged parts of the program and skip the tuning of those?

athas commented 1 month ago

Closed (sort of) by #1857. We still inline very aggressively, but we no longer inline literally everything.