[Tracking Issue] Compiler optimizations

Discussion

Opening this issue to track general optimization discussions, feel free to close and split it into specifics if that's better.

Main caveat: this list refers to what the crystal compiler can do using the information at the high language level. It's probably not productive to overlap in functionality too much with LLVM, since LLVM can do a much better job of low level/peephole optimizations.

Motivation

The crystal reference describes several simple ways to avoid performance issues. Some of those, and other similar optimizations, are however common optimizations that compilers routinely perform. Taking just one example, hoisting side-effect-free object constructions (the creation of an array) out of a loop can be done automatically, and at comparatively little cost, before codegen and LLVM.

Ultimately, the more the compiler can do (following least-astonishment) the less the user has to think about, and the smoother and more productive their experience will be.

Rough starting point

Following the performance section in the reference (since we can assume those cases are well tested to be relevant in practice), crystal could:

detect and eliminate strings that are only created to then be written to an IO
detect and rewrite string formatting operations that are only written to an IO
detect equivalent but slower string operations, and rewrite them accordingly (e.g. "Hello " + name.to_s vs "Hello #{name}")
lift/rewrite objects that are repeatedly created in loops, when possible (i.e. when that object is only read and never modified)
detect and rewrite string indexing code into string iterating code when possible
change storage of structs and class between stack and heap (aka automatic (un)boxing), when performance can be gained while preserving semantics
- judicious use of autoboxing can also have beneficial snowball effects, as it could enable LLVM to trigger more optimizations (such as constant propagation/CSE/DCE for stack values) it would have missed otherwise.

Direction

Many optimization techniques are known, but their applicability and cost varies. Probably the most relevant data lies with profiling the compiler itself, and shards on crystalshards.xyz, and checking how much the new optimizations can do for the public code corpus.

Speaking from experience, some optimizations are practically cheap to perform, and so could probably be enabled by default in debug mode, granting a "free" speedup to crystal scripts using shebangs.

Optimizations are already done to some extent. I find Crystal quite fast already. The are also false optimizations, which can even worsen the situation in certain cases, i.e. performance and behavior.

For example:

detect equivalent but slower string operations, and rewrite them accordingly (e.g. "Hello " + name.to_s vs "Hello #{name}")

Not always true

require "benchmark"

TWO = "two"

Benchmark.ips do |bm|
  bm.report("Concatenation") do
    99.times do
      "one" + TWO.to_s
    end
  end

  bm.report("Interpolation") do
    99.times do
      "one#{TWO}"
    end
  end
end

Result:

Concatenation 733.86k (  1.36µs) (±10.69%)  3.1kB/op        fastest
Interpolation 370.21k (  2.70µs) (±12.24%)  3.1kB/op   1.98× slower

This is an interesting topic, but should be considered low priority compared to other ones like language stability, and probably features. As an example, faster compile times.

crystal-lang / crystal