apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.16k stars 421 forks source link

Evaluate Profile-Guided Optimization (PGO) and LLVM BOLT #3478

Open zamazan4ik opened 11 months ago

zamazan4ik commented 11 months ago

Description

Recently I checked Profile-Guided Optimization (PGO) improvements on many projects. All current results are available here. E.g. ClickHouse PGO results can be checked here. According to the multiple tests, PGO can help with improving performance in many cases. That's why I think trying to optimize the Gluten with PGO can be a good idea.

I can suggest the following action points:

Since the Gluten native part is the library, I think the Pydantic-core experience can be reused here. Also, Clang supports PGO for shared libraries.

Maybe testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

Here are some examples of how PGO optimization is integrated in other projects:

baibaichen commented 11 months ago

@zhanglistar

FelixYBW commented 11 months ago

It's good idea to have such compiler optimizations. The problem is which workload should be used to profile. TPCH/DS doesn't match customers' real workload.

zamazan4ik commented 11 months ago

The problem is which workload should be used to profile. TPCH/DS doesn't match customers' real workload.

If you say that TPCH/DS does not match customers' real workload, that means you know the actual workload for your customers, right? :)

One of the options could be:

In this case, you have no need to guess about the customers' workload. For generic customers, you will have a "default" build (as you already provide), but for customers who want to extract as much as possible performance - they will have a ready-to-use build option in Gluten to recompile the library according to the customer's workload.

If you still want to optimize prebuilt binaries with PGO, you can try to collect feedback from your users about their actual workloads via discussions, collecting their PGO profiles, etc. Then all collected information can be used for making a "generic enough" profile that can be used in Gluten CI for building a PGO-optimized binary.