bloomberg / blazingmq

A modern high-performance open source message queuing system
https://bloomberg.github.io/blazingmq/
Apache License 2.0
2.54k stars 132 forks source link

Evaluate Profile-Guided Optimization (PGO) #42

Closed zamazan4ik closed 1 year ago

zamazan4ik commented 1 year ago

Is there an existing proposal for this?

Is your feature request related to a problem?

No - that's just an idea of how to improve the performance of BlazingMQ possibly.

Describe the solution you'd like

Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. I think these results are quite promising and can be a stimulus to test PGO with BlazingMQ.

We need to perform PGO benchmarks on BlazingMQ. And if it shows improvements (reduced CPU usage/reduced latency/anything else) - add a note to the documentation about possible improvements in BlazingMQ performance with PGO. Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users too.

Alternatives you considered

No response

quarter-note commented 1 year ago

Hi @zamazan4ik, thanks for your interest in BlazingMQ! We have heard about PGO but have not had a chance to apply it to BlazingMQ.

We are open to trying out PGO and benchmark BlazingMQ. Some questions/comments:

  1. We have a tool which helps us to measure BlazingMQ's performance in an automated and repeated way under various scenarios. However, this tool is not open sourced yet, and benchmarking BlazingMQ manually repeatedly can be cumbersome. Would you be willing to help carry out benchmarking BlazingMQ with PGO enabled?

  2. Figuring out the impact of a small change on performance from benchmarking BlazingMQ can be challenging because of minor variations in each benchmarking run. So hypothetically, if PGO delivers an improvement of 5% in latency, it may not be conclusive across multiple runs due to minor variations.

  3. How do different GCC versions interact w/ PGO? Say we build with GCC 10 vs 11 vs 12. Can we expect PGO to have same impact across all GCC versions?

  4. Does PGO have any negative impact?

Thanks!

zamazan4ik commented 1 year ago

Would you be willing to help carry out benchmarking BlazingMQ with PGO enabled?

Not sure what kind of help you need here :) You can benchmark BlazingMQ internally with your closed-source benchmark tool. The only difference here would be how BlazingMQ is built (most likely you will start with Instrumentation PGO, so build with -fprofile-instr-generate for Instrumentation, run your benchmark tool with a near real-life scenario on the Instrumented BlazingMQ, collect profiles, and then build BlazingMQ once again with -fprofile-instr-use flag).

Figuring out the impact of a small change on performance from benchmarking BlazingMQ can be challenging because of minor variations in each benchmarking run. So hypothetically, if PGO delivers an improvement of 5% in latency, it may not be conclusive across multiple runs due to minor variations.

Correct. So I suggest running the benchmarks multiple times and collecting the results from all runs. It would be much easier to work with all results later (e.g. calculate percentiles). According to my experience, if you run the benchmarks multiple times (how many exactly - depends on the case), this kind of instability is almost always eliminated.

How do different GCC versions interact w/ PGO? Say we build with GCC 10 vs 11 vs 12. Can we expect PGO to have same impact across all GCC versions?

Here there are multiple things. At first, profiles (.gcda files in the GCC case, .profraw in the Clang case) are not guaranteed to be compatible between compiler versions. So if you collect profile data and save it somewhere (e.g. in VCS) , after the compiler upgrade you need to update the profile data too. If you support multiple compilers - you need to store multiple profiles. This problem could be solved just by not saving profiles though (and providing a compiler-agnostic script, which helps to collect these profiles).

About the same impact (under impact I mean efficiency of PGO optimization across multiple GCC versions), I would say - yes. PGO is just a compiler optimization like many others. So if you expect that let's say inlining and loop unrolling a work in general in the same way between GCC upgrades - you can expect the same from PGO too. If you need more guarantees - I think you will need to check compilers release notes, maybe tracking some activity in the corresponding mailing lists, etc. But I think you don't need it :)

Does PGO have any negative impact?

It depends :) Imagine if you chose a completely unrealistic scenario for the training phase. So the collected profiles will optimize for that unrealistic scenario, and it could harm your real-life performance (like a wrong decision in hot-cold separation optimization.). It's simply mitigated by choosing a production-like workload - in this case, the chance to get a degradation from PGO is much-much less. So in general - PGO does not have a negative impact.

Hope my answers would be helpful :)

zamazan4ik commented 1 year ago

Oh, and one more addition. You also could be interested in applying Post-Link Optimization with LLVM BOLT: https://github.com/llvm/llvm-project/blob/main/bolt/README.md It allows you to get more performance even after PGO due to some specific I-cache friendly code-layout reorganization. But I suggest investing your time/money in this only after PGO.

quarter-note commented 1 year ago

Not sure what kind of help you need here :)

I think I was clear :). I am afraid we don't have bandwidth to carry out this work in the next few months, hence my question. Would you be willing to carry out this work (updating BlazingMQ build to pass appropriate flags, carry out manual benchmarking, determining the impact of PGO, etc)? Of course, we would help you wherever needed.

zamazan4ik commented 1 year ago

Ahh, sorry, seems like my mental parser is a bit tired :D

Well, maybe one day. I will be honest - BlazingMQ is not on my top priority list right now (just because we don't use BlazingMQ at work). As an additional concern is benchmarking - yeah. Since I have no access to this tool, it would be more difficult to perform the training phase and the benchmark itself. If you can open-source your tool... It would make life easier at least for external BlazingMQ adopters.

quarter-note commented 1 year ago

As an additional concern is benchmarking

Definitely. Manual benchmarking is not fun.

If you can open-source your tool...

Yes! It is in our short term roadmap.

Let's touch base in a few months.

quarter-note commented 1 year ago

@zamazan4ik I am closing this one. Feel free to reopen if you'd like to help out. Thanks.

zamazan4ik commented 1 year ago

@quarter-note I think the issue should stay open. So anyone else can help with it, not exactly me.