Enable PGO for benchmarks

djkoloski commented 2 years ago

Enabling profile-guided optimization will provide some numbers that are the best they can be for each framework. It might be worth separating these out from the general numbers so users can get an idea of how much they stand to gain for their efforts.

djkoloski commented 2 years ago

Look at autofdo with perf record.

zamazan4ik commented 2 months ago

I do ongoing PGO research on different applications - all results are available at https://github.com/zamazan4ik/awesome-pgo . I performed some PGO benchmarks on the rust_serialization_benchmark too and want to share my results here.

Test environment

Fedora 39
Linux kernel 6.7.6
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.76
Repo version: master branch on commit ce821970017832f43d00bc5110462ff1a2c38e17
Disabled Turbo boost for improving consistency across runs

Benchmark

Release benchmarks are done with tasket -c 0 cargo +nightly bench, PGO training phase with tasket -c 0 cargo +nightly pgo bench, PGO optimization with tasket -c 0 cargo +nightly pgo optimize bench. taskset -c 0 is used for better benchmark consistency. All PGO-related routines are done with cargo-pgo. All benchmarks are done on the same machine, with the same hardware/software during runs, with the same background "noise" (as much as I can guarantee, of course).

Results

Here are the results:

Release: https://gist.github.com/zamazan4ik/292d71316488e3e09bcb56f4e499f42e
PGO-optimized: https://gist.github.com/zamazan4ik/5cfb7a8acbe37ca3e876ba1a2468f5d5
(just for reference) PGO instrumented: https://gist.github.com/zamazan4ik/84ff553619aae1faab7a40fa33944d86

At least in the provided by project benchmarks, there are measurable improvements in many cases. However, also there are some regressions.

Look at autofdo with perf record.

I recommend starting with the regular PGO via instrumentation. AutoFDO is used for sampling the PGO approach. Starting with the instrumentation is generally a better idea since it has wider platform support and can be easily enabled for the project (compared to the sampling-based PGO).

finnbear commented 1 month ago

Great job presenting the results of PGO! In general, it seems like PGO increases average performance by ~5% but introduces noise. Not necessarily from run to run, but from benchmark to benchmark and version to version. Might make more sense to average PGO results over all datasets (and just live with the fact that there are only 4, so the average isn't totally immune to noise). Could also average over all crates, and just report a single PGO result. The goal is to give users an accurate answer for 1) which crate 2) whether to try PGO.

djkoloski / rust_serialization_benchmark