Open danthegoodman1 opened 1 year ago
I did some analysis with bloaty, the AggravateFunctions
take up a lot of space.
We can work with clickhouse team to add more compile flags to enable stripping some of them.
compileunits | vmsize | filesize |
---|---|---|
[section .debug_loc] | 0 | 208584312 |
[section .text] | 151282685 | 151282685 |
[section .strtab] | 0 | 59946963 |
[section .debug_ranges] | 0 | 56290240 |
[section .rodata] | 48926306 | 48926306 |
./src/AggregateFunctions/AggregateFunctionSumMap.cpp | 7787534 | 48880276 |
./src/AggregateFunctions/AggregateFunctionMin.cpp | 6657137 | 45979726 |
./src/AggregateFunctions/AggregateFunctionMax.cpp | 6640377 | 45884458 |
[section .dynstr] | 38266753 | 38266753 |
./src/Interpreters/HashJoin.cpp | 3760601 | 35714954 |
./src/Interpreters/Aggregator.cpp | 3730680 | 34616722 |
./src/AggregateFunctions/AggregateFunctionUniqCombined.cpp | 5393994 | 32101843 |
./src/AggregateFunctions/AggregateFunctionAvgWeighted.cpp | 5165493 | 31640382 |
./src/AggregateFunctions/AggregateFunctionStatisticsSimple.cpp | 6239088 | 30171734 |
./src/AggregateFunctions/AggregateFunctionUniq.cpp | 3541725 | 28680035 |
./src/AggregateFunctions/AggregateFunctionStatistics.cpp | 5169543 | 26427110 |
./src/Interpreters/ActionsDAG.cpp | 219076 | 23502928 |
./src/Storages/MergeTree/KeyCondition.cpp | 126008 | 19541606 |
./src/AggregateFunctions/AggregateFunctionDeltaSumTimestamp.cpp | 2681548 | 19281574 |
./src/Planner/Planner.cpp | 59051 | 16236022 |
./src/Planner/PlannerJoins.cpp | 40060 | 16084904 |
./src/Planner/PlannerJoinTree.cpp | 51178 | 16039921 |
./contrib/hive-metastore/ThriftHiveMetastore.cpp | 2556915 | 15995512 |
[section .eh_frame] | 15656064 | 15656064 |
./src/AggregateFunctions/AggregateFunctionSparkbar.cpp | 1801900 | 15486531 |
./src/Interpreters/castColumn.cpp | 27915 | 15357640 |
./src/Storages/StorageReplicatedMergeTree.cpp | 1145437 | 13526552 |
./src/Columns/ColumnVector.cpp | 1582603 | 13425874 |
./src/Core/Settings.cpp | 1331377 | 11051219 |
./src/AggregateFunctions/AggregateFunctionSimpleLinearRegression.cpp | 1463071 | 10388820 |
./src/Formats/ProtobufSerializer.cpp | 571179 | 9525325 |
[section .gcc_except_table] | 9466872 | 9466872 |
./src/Storages/MergeTree/MergeTreeData.cpp | 670858 | 9271290 |
[section .symtab] | 0 | 8849736 |
./src/AggregateFunctions/AggregateFunctionQuantile.cpp | 1094793 | 8697684 |
./contrib/hive-metastore/hive_metastore_types.cpp | 1262919 | 8545635 |
./src/Core/SettingsEnums.cpp | 307120 | 8250957 |
./src/AggregateFunctions/AggregateFunctionQuantileExactWeighted.cpp | 939254 | 8015627 |
./contrib/NuRaft/src/asio_service.cxx | 680502 | 7699920 |
./src/AggregateFunctions/AggregateFunctionAny.cpp | 1000265 | 7280299 |
./src/Columns/ColumnDecimal.cpp | 806743 | 7277378 |
./src/AggregateFunctions/AggregateFunctionQuantileDeterministic.cpp | 897465 | 7270971 |
./src/AggregateFunctions/AggregateFunctionGroupArrayMoving.cpp | 1169946 | 7142681 |
./src/Interpreters/InterpreterSelectQuery.cpp | 322195 | 7088294 |
./src/Interpreters/Context.cpp | 400123 | 6891480 |
./contrib/libunwind/src/libunwind.cpp | 42849 | 6649674 |
./src/AggregateFunctions/AggregateFunctionSum.cpp | 935631 | 6490650 |
./src/AggregateFunctions/AggregateFunctionSequenceMatch.cpp | 464328 | 6480659 |
./src/AggregateFunctions/AggregateFunctionTopK.cpp | 583102 | 6290272 |
Full table: clickhouse.sizeinfo.csv
@alexey-milovidov any input or comments on this intent and related opportunities from the ClickHouse team?
We can easily disable particular storages during the build - first try to do it manually and check the difference in the binary size. Then we can introduce some flags for it.
Disabling functions is as easy as removing a .cpp file.
Also, we can get benefits from the removal of the dynamic symbol table: https://github.com/ClickHouse/ClickHouse/pull/47475
Simply strip can reduce about 100MB file size.
@zhanglistar true, but we already strip libchdb.so which brings it to about ~380MB uncompressed and ~100MB compressed.
Also, disabling some third party libraries can reduce size, like hive or mysql etc.
Also, disabling some third party libraries can reduce size, like hive or mysql etc.
Thanks, I have disabled unnecessary libs for chdb. Hive and MySQL might be useful for chDB users.
@nmreadelf is working on that
A lot of binary size comes from the tables engines that may not be relevant for in-process use cases like the merge tree engines, log engines, etc.
Would be great to either have an easy way to compile with engines omitted, or a build that is effectively engine-free (except for some basics like url, s3, file) for a far smaller build. The expectation is that custom engines would be made on top of the url/s3 engines in #52 as sorts of aliases