Kobzol / cargo-pgo

Cargo subcommand for optimizing Rust binaries/libraries with PGO and BOLT.
MIT License
560 stars 11 forks source link

Add support for more PGO modes #33

Open zamazan4ik opened 1 year ago

zamazan4ik commented 1 year ago

Hi!

Do you plan to add to cargo-pgo support for the additional PGO modes:

And thank you again for cargo-pgo - it's much easier to apply PGO to Rust-based binaries with it :)

Kobzol commented 1 year ago

Hi, I don't have any experience with AutoFDO, but if it's integrated at least partly into rustc, I'll try to take a look how hard would it be to support it.

Regarding BOLT, it's currently Linux only also for instrumentation, so that's not a problem for me :) However, I have an AMD CPU that doesn't support LBR profiling with perf for BOLT, so I couldn't test this locally :/ I'll try to use some older PC for that.

As for Propeller, I consider it to be deprecated in favour of BOLT.

zamazan4ik commented 1 year ago

Hi, I don't have any experience with AutoFDO, but if it's integrated at least partly into rustc, I'll try to take a look how hard would it be to support it.

Would be great!

However, I have an AMD CPU that doesn't support LBR profiling with perf for BOLT, so I couldn't test this locally :/ I'll try to use some older PC for that.

Well, you could run even without LBR support - BOLT will still consume this recording as a valid (e.g. I did the same thing here - https://github.com/ydb-platform/ydb/issues/140#issuecomment-1484288211 on my AMD Ryzen 9 5900X and Fedora 37 setup). I think optimization results, in this case, are worse than with LBR support but... I have no desire to change my CPU just for that :)

As for Propeller, I consider it to be deprecated in favour of BOLT.

Nope :) This tool is not deprecated at least from their developers' point of view. E.g you could check the most recent results here - this paper is from 2023. And Propeller developers planning to integrate this tool into the LLVM as well it's already done by BOLT. From my understanding, maybe one day BOLT and Propeller will be somehow merged into one tool (hopefully somewhere directly into LLVM linkers and/or PGO infra) but when we will get it... I cannot predict.

Kobzol commented 1 year ago

Nope :) This tool is not deprecated at least from their developers' point of view. E.g you could check the most recent results here - this paper is from 2023. And Propeller developers planning to integrate this tool into the LLVM as well it's already done by BOLT. From my understanding, maybe one day BOLT and Propeller will be somehow merged into one tool (hopefully somewhere directly into LLVM linkers and/or PGO infra) but when we will get it... I cannot predict.

Oh, I didn't know that, maybe I mistook it for another tool. Well, if they have an easy way of profiling/instrumenting and optimizing binaries, a reasonable deployment mechanism and some documentation, I'm not opposed :) But I definitely do not plan to do shenanigans in this tool to build and support it if it's code and usage is in the typical software research open-source state 😅 I wonder if I could somehow generalize the BOLT support in cargo-pgo so that users could "plug in" their own instrumenter and optimizer, for any tool they want 🤔.

zamazan4ik commented 1 year ago

But I definitely do not plan to do shenanigans in this tool to build and support it if it's code and usage is in the typical software research open-source state

Agreed :) However, Propeller is a default Post Link Optimization tool in Google right know (integrated into their build pipelines for a bunch of their services, etc) so I think it's quite usable in real-life, not just "usual research tool" :) Hopefully, this repo could clarify some things about the stuff, what should be done to apply Propeller for a real application (Clang, in the provided case).

I wonder if I could somehow generalize the BOLT support in cargo-pgo so that users could "plug in" their own instrumenter and optimizer, for any tool they want

That's a good question to think. IMHO would be quite difficult to provide stable enough generalization over these tools since they are evolving and quite unstable (at least from a public interface point of view). E.g. BOLT team now is working on a new BOLT approach that is called "Lightning BOLT" and probably would change interfaces/add a one new mode (in additionn to VESPA). Propeller is going on to be merged into LLVM in some form and possibly also would change some way how we should use it.

Propeller has one advantage over BOLT right now - much less memory usage spike. Even on my machine with 32 Gib RAM I am not always able to BOLTify my app due to OOM. Another point of pain - BOLT weakly supports other architectures except x86-64 (but they are working on it).

zamazan4ik commented 1 year ago

By the way, I am working on gathering all available information regarding PGO in one place - https://github.com/ZaMaZaN4iK/awesome-pgo . Maybe some links would be interesting to you for reading (although I think you already read almost all of them :)

Kobzol commented 1 year ago

This is what I currently do for BOLT instrumentation: 1) Set specific linker flags 2) Clear profile directory 3) Run a command for each built binary to instrument it

And for BOLT optimization: 1) Set specific linker flags 2) Merge profiles 3) Run a command for each built binary to optimize it

In theory, we could add something like this:

$ cargo pgo custom instrument -- ./instrument.sh

# instrument.sh
propeller instrument $INPUT_BINARY

$ cargo pgo custom optimize -- ./optimize.sh

# optimize.sh
propeller optimize $INPUT_BINARY

If specific compiler/linker flags are needed, they can be passed with RUSTFLAGS=... cargo pgo custom optimize.

Of course, with this generic approach, the user would be responsible for gathering and managing the profiles and writing the instrumentation and optimization scripts. Basically the only added value of cargo pgo would be to serve as a wrapper over cargo. It would pass all the binaries that should be instrumented/optimized to the custom optimization tool. I'm not sure if that is a useful enough feature to add support for custom optimization backends though.