envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.95k stars 4.8k forks source link

Try to enable PGO #25500

Open zamazan4ik opened 1 year ago

zamazan4ik commented 1 year ago

Title: Enable PGO for Envoy

Description: Profile-Guided Optimization (PGO) allows gaining additional performance for the software since it uses runtime profile information to perform more advanced optimization during the compilation process. I guess it could be useful for Envoy.

Possible steps:

Possible future steps for improving:

[optional Relevant Links:]

adisuissa commented 1 year ago

Thanks for sharing the idea, overall PGO sounds good to me. There's been some previous discussion about LTO and some bumps that were encountered (see #4159 for example).

zamazan4ik commented 1 year ago

Yep, I've seen the discussion about LTO. I just didn't want to mix the discussion about LTO and PGO into the same issue. If you think that the issue about PGO should be discussed as yet another default compiler flag - feel free to mention it in #4159. However, I recommend to track it separately since PGO requires a little bit more work around it.

adisuissa commented 1 year ago

I've added the comment to link a related issue/PR so that when someone attempts to build with PGO this may help them.

zamazan4ik commented 1 year ago

I just finished some Profile-Guided Optimization (PGO) benchmarks for Envoy and want to share my results.

Test environment

Benchmark setup

An idea of how to do the benchmark I got from the Rathole benchmark guide for HTTP load. So I implemented the same benchmark for Envoy: Benchmark tool -> Envoy -> Nginx.

As a benchmark tool, I use Nighthawk with this command line: taskset -c 4-5 ./nh/nighthawk_client --rps 10000 --duration 300 --connections 4 --concurrency auto --prefetch-connections -v info http://127.0.0.1:8080.

Envoy was tested with this command line: taskset -c 0 ./envoy_static_release_master --concurrency 1 --config-path envoy-demo.yaml. envoy-demo.yaml content is here: https://pastebin.com/QfZi19Nu . I use --concurrency 1 since I want to load Envoy to 100% on 1 core so I can easily measure the maximum throughput and get the difference in max RPS between Release and PGO builds.

taskset is used everywhere just to reduce OS scheduling noise during the measurements. All measurements are done multiple times, on the same hardware/software with the same background load (as much as I can guarantee).

Optimization steps

Release Envoy is built with bazel build -c opt envoy --config=docker-clang command.

Envoy PGO is built in the following steps:

In the last step, there is one tricky place - you need to somehow mount your PGO profile into the container since here I used the Docker build configuration. I resolved it by putting this line to the root .bazelrc file: build:docker-sandbox --sandbox_add_mount_pair=/home/zamazan4ik/open_source/bench_envoy/profiles:/execroot/profiles. Probably, it could be done via the Bazel command line too - I don't know since I have almost no experience with Bazel.

Results

In short, I get the following RPS results from Nighthawk:

More detailed reports from Nighthawk are available here:

According to the tests, PGO helps a lot with optimizing Envoy's performance (from latency and throughput perspectives).

Possible further steps

I can suggest the following action points:

Maybe testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too but I recommend starting from the usual PGO.

Found caveats / interesting details

Useful links

Much more results about PGO, its results on different kinds of software, possible caveats, PGO tricky moments, and much more you can find in my repo here.