Open zamazan4ik opened 1 year ago
Thank you for suggesting this enhancement. Definitely something we gonna explore in the future. By the way, we already use "-flto" in our release pipeline.
Having said that, based on my experience with Dragonfly, the majority of the CPU there is spent in the kernel, especially with higher throughput. Another (much smaller) part is spent around Boost.Fibers. I yet need to see the use-case where Dragonfly can benefit from these optimzations.
I just finished benchmarking Redis with PGO - link. I think these results can be useful for DragonflyDB too.
I did some testing of PGO applied to DragonflyDB.
main
branch (commit e71fae7eea921b6396e4184cca7d26ee9960ec0e
)I have tested the following DragonflyDB configurations:
-DCMAKE_BUILD_TYPE=Release
)-DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-fprofile-instr-use=db.profdata"
)As a PGO technique, I use -fprofile-instr-generate
/-fprofile-instr-use
options from Clang. Build instrumented server version, run memtier_benchmark
with the instrumented DragonflyDB, collect instrumentation data, then rebuild DragonflyDB again with the collected data.
I use memtier_benchmark
with taskset -c 1-4 memtier_benchmark –ratio 0:1 -t 4 -c 30 –distinct-client-seed -d 256 –key-maximum 1000000 –hide-histogram –pipeline 30 --test-time=300
. DragonflyDB is started with the command taskset -c 0 dragonfly --logtostderr --proactor_threads=1
. I use one thread since it gives more consistent results (since DragonflyDB is running on the same machine with memtier_benchmark
).
All configurations are benchmarked on the same machine, with the same DragonflyDB configuration, multiple times, etc. The results are shown in memtier_benchmark
format. I have rechecked - the results are consistent between runs.
Maybe on some other loads the win will be bigger. Also, didn't test BOLT (llvm-bolt
) yet. More info about other PGO results for different kinds of software you can find here.
Did you search GitHub Issues and GitHub Discussions First? Yes, no results.
Is your feature request related to a problem? Please describe. Not a problem - an opportunity.
Describe the solution you'd like
DragonflyDB right now does not support building with more advanced optimization techniques like PGO and BOLT. This tooling has an increasing adoption in the community as a tool to additionally optimize programs. With this tooling, there is a huge chance to gain even more performance "for free".
Here I suggest considering an option at least to play with LTO + PGO + Bolt pipeline (or any combination of them) and test, does it give a performance to the project or not. If yes, would be awesome to have prebuilt binaries with more advanced optimization from the scratch. Also, for the users will be helpful to have the ability to tweak manually their own binaries to their own workloads with the integrated into the build scripts functionality.
Also, there are some caveats to consider like:
Links: