dragonflydb / dragonfly

A modern replacement for Redis and Memcached
https://www.dragonflydb.io/
Other
24.49k stars 884 forks source link

Consider using LTO + PGO + Bolt #592

Open zamazan4ik opened 1 year ago

zamazan4ik commented 1 year ago

Did you search GitHub Issues and GitHub Discussions First? Yes, no results.

Is your feature request related to a problem? Please describe. Not a problem - an opportunity.

Describe the solution you'd like

DragonflyDB right now does not support building with more advanced optimization techniques like PGO and BOLT. This tooling has an increasing adoption in the community as a tool to additionally optimize programs. With this tooling, there is a huge chance to gain even more performance "for free".

Here I suggest considering an option at least to play with LTO + PGO + Bolt pipeline (or any combination of them) and test, does it give a performance to the project or not. If yes, would be awesome to have prebuilt binaries with more advanced optimization from the scratch. Also, for the users will be helpful to have the ability to tweak manually their own binaries to their own workloads with the integrated into the build scripts functionality.

Also, there are some caveats to consider like:

Links:

romange commented 1 year ago

Thank you for suggesting this enhancement. Definitely something we gonna explore in the future. By the way, we already use "-flto" in our release pipeline.

Having said that, based on my experience with Dragonfly, the majority of the CPU there is spent in the kernel, especially with higher throughput. Another (much smaller) part is spent around Boost.Fibers. I yet need to see the use-case where Dragonfly can benefit from these optimzations.

zamazan4ik commented 1 year ago

I just finished benchmarking Redis with PGO - link. I think these results can be useful for DragonflyDB too.

zamazan4ik commented 1 year ago

I did some testing of PGO applied to DragonflyDB.

Test environment

Tested configurations

I have tested the following DragonflyDB configurations:

As a PGO technique, I use -fprofile-instr-generate/-fprofile-instr-use options from Clang. Build instrumented server version, run memtier_benchmark with the instrumented DragonflyDB, collect instrumentation data, then rebuild DragonflyDB again with the collected data.

Benchmark

I use memtier_benchmark with taskset -c 1-4 memtier_benchmark –ratio 0:1 -t 4 -c 30 –distinct-client-seed -d 256 –key-maximum 1000000 –hide-histogram –pipeline 30 --test-time=300. DragonflyDB is started with the command taskset -c 0 dragonfly --logtostderr --proactor_threads=1 . I use one thread since it gives more consistent results (since DragonflyDB is running on the same machine with memtier_benchmark).

Results

All configurations are benchmarked on the same machine, with the same DragonflyDB configuration, multiple times, etc. The results are shown in memtier_benchmark format. I have rechecked - the results are consistent between runs.

Release ``` ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 55254.34 --- --- 0.19819 0.19900 0.21500 0.24700 16397.59 Gets 552541.25 3815.65 548725.59 0.19771 0.19900 0.21500 0.23900 22488.64 Waits 0.00 --- --- --- --- --- --- --- Totals 607795.59 3815.65 548725.59 0.19775 0.19900 0.21500 0.23900 38886.23 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 55053.75 --- --- 0.19766 0.19900 0.21500 0.27900 16338.06 Gets 550535.70 3812.30 546723.41 0.19716 0.19900 0.21500 0.27900 22409.66 Waits 0.00 --- --- --- --- --- --- --- Totals 605589.45 3812.30 546723.41 0.19720 0.19900 0.21500 0.27900 38747.73 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 54142.54 --- --- 0.19961 0.19900 0.21500 0.23100 16067.64 Gets 541423.33 3712.54 537710.79 0.19906 0.19900 0.21500 0.23100 22029.47 Waits 0.00 --- --- --- --- --- --- --- Totals 595565.87 3712.54 537710.79 0.19911 0.19900 0.21500 0.23100 38097.12 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 54862.17 --- --- 0.20090 0.19900 0.22300 0.23100 16281.21 Gets 548619.54 3732.90 544886.65 0.20041 0.19900 0.22300 0.23100 22314.94 Waits 0.00 --- --- --- --- --- --- --- Totals 603481.72 3732.90 544886.65 0.20046 0.19900 0.22300 0.23100 38596.15 ```
Release + PGO ``` ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 56478.72 --- --- 0.19128 0.19100 0.20700 0.22300 16760.95 Gets 564785.16 4037.36 560747.81 0.19082 0.19100 0.20700 0.22300 23021.66 Waits 0.00 --- --- --- --- --- --- --- Totals 621263.89 4037.36 560747.81 0.19086 0.19100 0.20700 0.22300 39782.62 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 56169.31 --- --- 0.19353 0.19900 0.21500 0.22300 16669.13 Gets 561691.03 3970.97 557720.06 0.19313 0.19900 0.21500 0.22300 22884.35 Waits 0.00 --- --- --- --- --- --- --- Totals 617860.34 3970.97 557720.06 0.19317 0.19900 0.21500 0.22300 39553.48 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 56491.59 --- --- 0.19121 0.19100 0.20700 0.24700 16764.77 Gets 564914.02 4039.67 560874.34 0.19080 0.19100 0.20700 0.23900 23027.27 Waits 0.00 --- --- --- --- --- --- --- Totals 621405.61 4039.67 560874.34 0.19084 0.19100 0.20700 0.23900 39792.04 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 57216.28 --- --- 0.19136 0.19100 0.20700 0.22300 16979.83 Gets 572160.55 4089.20 568071.35 0.19091 0.19100 0.20700 0.22300 23322.08 Waits 0.00 --- --- --- --- --- --- --- Totals 629376.82 4089.20 568071.35 0.19095 0.19100 0.20700 0.22300 40301.91 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 56998.75 --- --- 0.19322 0.19100 0.21500 0.32700 16915.27 Gets 569985.29 4035.50 565949.80 0.19285 0.19100 0.21500 0.32700 23223.75 Waits 0.00 --- --- --- --- --- --- --- Totals 626984.04 4035.50 565949.80 0.19288 0.19100 0.21500 0.32700 40139.03 ```

Maybe on some other loads the win will be bigger. Also, didn't test BOLT (llvm-bolt) yet. More info about other PGO results for different kinds of software you can find here.