Recently I checked Profile-Guided Optimization (PGO) improvements on many projects. All current results are available here. E.g. ClickHouse PGO results can be checked here, ClickHouse documentation about building with PGO - https://clickhouse.com/docs/en/operations/optimizing-performance/profile-guided-optimization . According to the multiple tests, PGO can help with improving performance in many cases (including databases). That's why I think trying to optimize chdb with PGO can be a good idea.
I can suggest the following action points:
Perform PGO benchmarks on chdb. And if it shows improvements - add a note about possible improvements in chdb performance with PGO.
Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize chdb according to their own workloads.
Optimize pre-built binaries
Since the chdb native part is the library, I think the Pydantic-core experience can be reused here. Also, Clang supports PGO for shared libraries.
Maybe testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.
Here are some examples of how PGO optimization is integrated in other projects:
Description
Recently I checked Profile-Guided Optimization (PGO) improvements on many projects. All current results are available here. E.g. ClickHouse PGO results can be checked here, ClickHouse documentation about building with PGO - https://clickhouse.com/docs/en/operations/optimizing-performance/profile-guided-optimization . According to the multiple tests, PGO can help with improving performance in many cases (including databases). That's why I think trying to optimize chdb with PGO can be a good idea.
I can suggest the following action points:
Since the chdb native part is the library, I think the Pydantic-core experience can be reused here. Also, Clang supports PGO for shared libraries.
Maybe testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.
Here are some examples of how PGO optimization is integrated in other projects:
configure
script