Open deeprobin opened 1 year ago
This does raise an interesting point about .NET using PGO to optimise itself. e.g. consider we could collect static PGO data of .NET compiling itself and bundle that in with R2R images of the various .NET binaries, and if we go the BOLT route as well we can instrument corerun as its compiling itself, and have a high-quality set of PGO data we're ready to run with. It looks like Rust just uses a few common crates to optimise itself.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
Author: | deeprobin |
---|---|
Assignees: | - |
Labels: | `area-CodeGen-coreclr`, `untriaged`, `needs-area-label` |
Milestone: | - |
We internally discussed about this recently. CC @kunalspathak, @EgorBo, @AndyAyersMS.
In .NET 9, we are starting with https://github.com/dotnet/runtime/issues/92915. Opportunistically, we might look into utilizing BOLT.
BOLT might makes sense to try for e.g. GC and VM native code but it seems unlikely it will result into anything meanigful (even -march=native
and -O3
has no visible impact compared with our defaults).
For NativeAOT/generated managed code it's very unlikely to be usable at all due to GC info, etc so we're sceptical about it, I am sure we have simpler things to do to get 2% improvement rather than hooking a heavy/slow lib into CI 🙂
In the .NET area, we have already made great strides in the PGO area.
The Rust compiler (rustc) has been optimized with BOLT since last week (see https://github.com/rust-lang/rust/pull/94381). BOLT originally comes from a Facebook Incubator project, which will then be adopted into LLVM at some point.
Apparently they talk about ~2% performance optimization when using that.
Here, of course, we can think about whether this makes much difference in JIT scenarios, since here it basically only optimizes the static parts (e.g. apphost, corerun, ...).
However, I see potential here especially in the NativeAOT area.
The only thing required would be PGO data like
perf
outputs.The difference to other optimizers is that this is a post optimizer, so this tries to optimize the binaries. One limitation I could find so far is that it only supports ELF binaries (at least what I could find out).
Would be very interested if you guys see any thoughts, challenges or obstacles here to optimize the .NET binaries with this.
If nothing stands in the way of this I would sit down to a PoC on how we could use this in runtime.
/type:feature /area:PGO