crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.34k stars 1.61k forks source link

[Tracking Issue] Compiler optimizations #11100

Open alexmaco opened 3 years ago

alexmaco commented 3 years ago

Discussion

Opening this issue to track general optimization discussions, feel free to close and split it into specifics if that's better.

Main caveat: this list refers to what the crystal compiler can do using the information at the high language level. It's probably not productive to overlap in functionality too much with LLVM, since LLVM can do a much better job of low level/peephole optimizations.

Motivation

The crystal reference describes several simple ways to avoid performance issues. Some of those, and other similar optimizations, are however common optimizations that compilers routinely perform. Taking just one example, hoisting side-effect-free object constructions (the creation of an array) out of a loop can be done automatically, and at comparatively little cost, before codegen and LLVM.

Ultimately, the more the compiler can do (following least-astonishment) the less the user has to think about, and the smoother and more productive their experience will be.

Rough starting point

Following the performance section in the reference (since we can assume those cases are well tested to be relevant in practice), crystal could:

Direction

Many optimization techniques are known, but their applicability and cost varies. Probably the most relevant data lies with profiling the compiler itself, and shards on crystalshards.xyz, and checking how much the new optimizations can do for the public code corpus.

Speaking from experience, some optimizations are practically cheap to perform, and so could probably be enabled by default in debug mode, granting a "free" speedup to crystal scripts using shebangs.

j8r commented 3 years ago

Optimizations are already done to some extent. I find Crystal quite fast already. The are also false optimizations, which can even worsen the situation in certain cases, i.e. performance and behavior.

For example:

detect equivalent but slower string operations, and rewrite them accordingly (e.g. "Hello " + name.to_s vs "Hello #{name}")

Not always true

require "benchmark"

TWO = "two"

Benchmark.ips do |bm|
  bm.report("Concatenation") do
    99.times do
      "one" + TWO.to_s
    end
  end

  bm.report("Interpolation") do
    99.times do
      "one#{TWO}"
    end
  end
end

Result:

Concatenation 733.86k (  1.36µs) (±10.69%)  3.1kB/op        fastest
Interpolation 370.21k (  2.70µs) (±12.24%)  3.1kB/op   1.98× slower

This is an interesting topic, but should be considered low priority compared to other ones like language stability, and probably features. As an example, faster compile times.