Kobzol / cargo-pgo

Cargo subcommand for optimizing Rust binaries/libraries with PGO and BOLT.
MIT License
563 stars 11 forks source link

Add `cargo pgo bench` command #9

Closed Kobzol closed 2 years ago

Kobzol commented 2 years ago

This allows gathering profiles from running benchmarks.

Related issue: https://github.com/Kobzol/cargo-pgo/issues/5

mvtec-bergdolll commented 2 years ago

@Kobzol first of all, thank for the fast reply and pr. Here's what I found. I did:

  1. cargo pgo build -- --bench bench
  2. Run instrumented benchmark to collect data (yes that's somewhat cheating)
  3. cargo pgo optimize -- --bench bench
  4. cargo pgo bench --bench bench -- --warm-up-time 1 --measurement-time 3 <some-filter-string>

Unfortunately I failed on step 4. Some attempts like passing one more layer of -- didn't work. It also tell me that cargo pgo bench will store pgo data. But really I want to run the pgo'd benchmark via cargo bench.

Kobzol commented 2 years ago

Oh, I understood wrong what is your goal. I thought that you want to gather the PGO profiles by running benchmarks, but it seems that you want to benchmark the PGO-optimized benchmark binaries?

Also, it seems like you are using criterion? I thought that you're using the nightly-only built-in benchmark suite.

Kobzol commented 2 years ago

I tried it with criterion and this command should work:

$ cargo pgo bench -- --bench "benchmark" -- --warm-up-time 1

However, it can only be used currently to gather data from benchmarks, this won't benchmark the optimized benchmark binary. I'll try to think about how to make this use-case available. It probably only makes sense for benchmarks.

mvtec-bergdolll commented 2 years ago

Yes exactly without manual file level modifications I don't know any way to use the optimized benchmark binary. That's where I need cargo pgo's help. I initially tried this to see if how munch my benchmarks change if I use pgo. In a way cargo pgo bench is useful for collecting data and for trying out an optimized binary. Maybe something like cargo pgo bench gather + cargo pgo bench run could replace steps 1-4.

Kobzol commented 2 years ago

I decided to generalize instrumentation even more and allow running any of build/run/test/bench with instrumentation or optimization. The corresponding PR is located here: https://github.com/Kobzol/cargo-pgo/pull/13. I will merge it soon and then add support for PGO benchmarks on top of that. After that it should be possible both to generate profiles with benchmarks and to benchmark optimized benchmarks :)

Kobzol commented 2 years ago

Now it should be possible to do what you have tried with the following two commands:

# Gather profiles
$ cargo pgo bench

# Run optimized benchmarks using the gathered profiles
$ cargo pgo optimize bench

Beware that you should also probably run just cargo bench between these two commands, otherwise the last timings from cargo pgo bench will be slower, since the benchmarks will be instrumented.

Note that this is not currently supported for BOLT, only for PGO.

Kobzol commented 2 years ago

@mvtec-bergdolll When you have time, please let me know if the current version of this PR solves your issue :) Thanks.

mvtec-bergdolll commented 2 years ago

@Kobzol yes it works. However I wonder if bundling in the optimize stage is ideal. Will it build it anew anytime there is new profiling data? I would prefer having explicit control over when a new optimized binary is built, and an explicit step that only does the benchmark running.

Kobzol commented 2 years ago

I tried it now and rustc only tracks the profile filename, so as long as the filename of the merged profile stays the same, it will not recompile the code even if the contents of the profile change. So now it will behave as you want. However, this is actually quite problematic in terms of cargo-pgo, I will probably change the behaviour so that a new profile is generated each time (or at least if its contents change), otherwise this is a footgun.

In general, I understand your usecase, but I'm not sure if bench is the right command for it, because it both compiles and runs the benchmarks. It's essentially cargo run, but for benchmarks. And cargo run will also first compile your code before running it. I'm not sure if you can run cargo bench without recompiling the code, for that you probably need to execute the compiled benchmarks binary directly.

mvtec-bergdolll commented 2 years ago

There are probably workarounds. Indeed the bench command is somewhat cross with the concepts at play here. How difficult would it be to add an explicit cargo pgo bench <path_to_intstrumented/optimized> that only runs the benchmark with the harness. Then all the existing pgo + bolt functionality would just work and explicit control over building and gathering would be identical.

Kobzol commented 2 years ago

I'm not sure exactly how are the benchmark binaries generated, I'll check it and think about possible extensions to cargo-pgo. I'll merge this PR, since it's a bit orthogonal to the original issue, and then I'll take a look at your use-case.

Btw thanks for the idea of profile invalidation during compilation, that's a quite serious footgun for this crate (possibly for rustc) too, I'll try to fix it here and post an issue to rustc.

mvtec-bergdolll commented 2 years ago

Thanks for the proactive help.