Open zamazan4ik opened 1 year ago
Hi, I don't have any experience with AutoFDO, but if it's integrated at least partly into rustc
, I'll try to take a look how hard would it be to support it.
Regarding BOLT, it's currently Linux only also for instrumentation, so that's not a problem for me :) However, I have an AMD CPU that doesn't support LBR profiling with perf
for BOLT, so I couldn't test this locally :/ I'll try to use some older PC for that.
As for Propeller, I consider it to be deprecated in favour of BOLT.
Hi, I don't have any experience with AutoFDO, but if it's integrated at least partly into rustc, I'll try to take a look how hard would it be to support it.
Would be great!
However, I have an AMD CPU that doesn't support LBR profiling with perf for BOLT, so I couldn't test this locally :/ I'll try to use some older PC for that.
Well, you could run even without LBR support - BOLT will still consume this recording as a valid (e.g. I did the same thing here - https://github.com/ydb-platform/ydb/issues/140#issuecomment-1484288211 on my AMD Ryzen 9 5900X and Fedora 37 setup). I think optimization results, in this case, are worse than with LBR support but... I have no desire to change my CPU just for that :)
As for Propeller, I consider it to be deprecated in favour of BOLT.
Nope :) This tool is not deprecated at least from their developers' point of view. E.g you could check the most recent results here - this paper is from 2023. And Propeller developers planning to integrate this tool into the LLVM as well it's already done by BOLT. From my understanding, maybe one day BOLT and Propeller will be somehow merged into one tool (hopefully somewhere directly into LLVM linkers and/or PGO infra) but when we will get it... I cannot predict.
Nope :) This tool is not deprecated at least from their developers' point of view. E.g you could check the most recent results here - this paper is from 2023. And Propeller developers planning to integrate this tool into the LLVM as well it's already done by BOLT. From my understanding, maybe one day BOLT and Propeller will be somehow merged into one tool (hopefully somewhere directly into LLVM linkers and/or PGO infra) but when we will get it... I cannot predict.
Oh, I didn't know that, maybe I mistook it for another tool. Well, if they have an easy way of profiling/instrumenting and optimizing binaries, a reasonable deployment mechanism and some documentation, I'm not opposed :) But I definitely do not plan to do shenanigans in this tool to build and support it if it's code and usage is in the typical software research open-source state 😅 I wonder if I could somehow generalize the BOLT support in cargo-pgo so that users could "plug in" their own instrumenter and optimizer, for any tool they want 🤔.
But I definitely do not plan to do shenanigans in this tool to build and support it if it's code and usage is in the typical software research open-source state
Agreed :) However, Propeller is a default Post Link Optimization tool in Google right know (integrated into their build pipelines for a bunch of their services, etc) so I think it's quite usable in real-life, not just "usual research tool" :) Hopefully, this repo could clarify some things about the stuff, what should be done to apply Propeller for a real application (Clang, in the provided case).
I wonder if I could somehow generalize the BOLT support in cargo-pgo so that users could "plug in" their own instrumenter and optimizer, for any tool they want
That's a good question to think. IMHO would be quite difficult to provide stable enough generalization over these tools since they are evolving and quite unstable (at least from a public interface point of view). E.g. BOLT team now is working on a new BOLT approach that is called "Lightning BOLT" and probably would change interfaces/add a one new mode (in additionn to VESPA). Propeller is going on to be merged into LLVM in some form and possibly also would change some way how we should use it.
Propeller has one advantage over BOLT right now - much less memory usage spike. Even on my machine with 32 Gib RAM I am not always able to BOLTify my app due to OOM. Another point of pain - BOLT weakly supports other architectures except x86-64 (but they are working on it).
By the way, I am working on gathering all available information regarding PGO in one place - https://github.com/ZaMaZaN4iK/awesome-pgo . Maybe some links would be interesting to you for reading (although I think you already read almost all of them :)
This is what I currently do for BOLT instrumentation: 1) Set specific linker flags 2) Clear profile directory 3) Run a command for each built binary to instrument it
And for BOLT optimization: 1) Set specific linker flags 2) Merge profiles 3) Run a command for each built binary to optimize it
In theory, we could add something like this:
$ cargo pgo custom instrument -- ./instrument.sh
# instrument.sh
propeller instrument $INPUT_BINARY
$ cargo pgo custom optimize -- ./optimize.sh
# optimize.sh
propeller optimize $INPUT_BINARY
If specific compiler/linker flags are needed, they can be passed with RUSTFLAGS=... cargo pgo custom optimize
.
Of course, with this generic approach, the user would be responsible for gathering and managing the profiles and writing the instrumentation and optimization scripts. Basically the only added value of cargo pgo
would be to serve as a wrapper over cargo
. It would pass all the binaries that should be instrumented/optimized to the custom optimization tool. I'm not sure if that is a useful enough feature to add support for custom optimization backends though.
Hi!
Do you plan to add to
cargo-pgo
support for the additional PGO modes:rustc
already supports it: https://github.com/rust-lang/rust/commit/a17193dbb931ea0c8b66d82f640385bce8b4929a AutoFDO could be useful for users who want to gather profiles directly from production env, without building slow instrumentation-only binariesperf
mode instead of instrumentation (for the same reasons as AutoFDO). Yes, I understand that it's only Linux-only feature but still - we have a lot of Linux-usersAnd thank you again for
cargo-pgo
- it's much easier to apply PGO to Rust-based binaries with it :)