chdb-io / chdb

chDB is an in-process OLAP SQL Engine 🚀 powered by ClickHouse
https://clickhouse.com/chdb
Apache License 2.0
2.13k stars 75 forks source link

Minified version with no table engines #53

Open danthegoodman1 opened 1 year ago

danthegoodman1 commented 1 year ago

A lot of binary size comes from the tables engines that may not be relevant for in-process use cases like the merge tree engines, log engines, etc.

Would be great to either have an easy way to compile with engines omitted, or a build that is effectively engine-free (except for some basics like url, s3, file) for a far smaller build. The expectation is that custom engines would be made on top of the url/s3 engines in #52 as sorts of aliases

auxten commented 1 year ago

I did some analysis with bloaty, the AggravateFunctions take up a lot of space. We can work with clickhouse team to add more compile flags to enable stripping some of them.

compileunits vmsize filesize
[section .debug_loc] 0 208584312
[section .text] 151282685 151282685
[section .strtab] 0 59946963
[section .debug_ranges] 0 56290240
[section .rodata] 48926306 48926306
./src/AggregateFunctions/AggregateFunctionSumMap.cpp 7787534 48880276
./src/AggregateFunctions/AggregateFunctionMin.cpp 6657137 45979726
./src/AggregateFunctions/AggregateFunctionMax.cpp 6640377 45884458
[section .dynstr] 38266753 38266753
./src/Interpreters/HashJoin.cpp 3760601 35714954
./src/Interpreters/Aggregator.cpp 3730680 34616722
./src/AggregateFunctions/AggregateFunctionUniqCombined.cpp 5393994 32101843
./src/AggregateFunctions/AggregateFunctionAvgWeighted.cpp 5165493 31640382
./src/AggregateFunctions/AggregateFunctionStatisticsSimple.cpp 6239088 30171734
./src/AggregateFunctions/AggregateFunctionUniq.cpp 3541725 28680035
./src/AggregateFunctions/AggregateFunctionStatistics.cpp 5169543 26427110
./src/Interpreters/ActionsDAG.cpp 219076 23502928
./src/Storages/MergeTree/KeyCondition.cpp 126008 19541606
./src/AggregateFunctions/AggregateFunctionDeltaSumTimestamp.cpp 2681548 19281574
./src/Planner/Planner.cpp 59051 16236022
./src/Planner/PlannerJoins.cpp 40060 16084904
./src/Planner/PlannerJoinTree.cpp 51178 16039921
./contrib/hive-metastore/ThriftHiveMetastore.cpp 2556915 15995512
[section .eh_frame] 15656064 15656064
./src/AggregateFunctions/AggregateFunctionSparkbar.cpp 1801900 15486531
./src/Interpreters/castColumn.cpp 27915 15357640
./src/Storages/StorageReplicatedMergeTree.cpp 1145437 13526552
./src/Columns/ColumnVector.cpp 1582603 13425874
./src/Core/Settings.cpp 1331377 11051219
./src/AggregateFunctions/AggregateFunctionSimpleLinearRegression.cpp 1463071 10388820
./src/Formats/ProtobufSerializer.cpp 571179 9525325
[section .gcc_except_table] 9466872 9466872
./src/Storages/MergeTree/MergeTreeData.cpp 670858 9271290
[section .symtab] 0 8849736
./src/AggregateFunctions/AggregateFunctionQuantile.cpp 1094793 8697684
./contrib/hive-metastore/hive_metastore_types.cpp 1262919 8545635
./src/Core/SettingsEnums.cpp 307120 8250957
./src/AggregateFunctions/AggregateFunctionQuantileExactWeighted.cpp 939254 8015627
./contrib/NuRaft/src/asio_service.cxx 680502 7699920
./src/AggregateFunctions/AggregateFunctionAny.cpp 1000265 7280299
./src/Columns/ColumnDecimal.cpp 806743 7277378
./src/AggregateFunctions/AggregateFunctionQuantileDeterministic.cpp 897465 7270971
./src/AggregateFunctions/AggregateFunctionGroupArrayMoving.cpp 1169946 7142681
./src/Interpreters/InterpreterSelectQuery.cpp 322195 7088294
./src/Interpreters/Context.cpp 400123 6891480
./contrib/libunwind/src/libunwind.cpp 42849 6649674
./src/AggregateFunctions/AggregateFunctionSum.cpp 935631 6490650
./src/AggregateFunctions/AggregateFunctionSequenceMatch.cpp 464328 6480659
./src/AggregateFunctions/AggregateFunctionTopK.cpp 583102 6290272

Full table: clickhouse.sizeinfo.csv

lmangani commented 1 year ago

@alexey-milovidov any input or comments on this intent and related opportunities from the ClickHouse team?

alexey-milovidov commented 1 year ago

We can easily disable particular storages during the build - first try to do it manually and check the difference in the binary size. Then we can introduce some flags for it.

Disabling functions is as easy as removing a .cpp file.

Also, we can get benefits from the removal of the dynamic symbol table: https://github.com/ClickHouse/ClickHouse/pull/47475

zhanglistar commented 1 year ago

Simply strip can reduce about 100MB file size.

lmangani commented 1 year ago

@zhanglistar true, but we already strip libchdb.so which brings it to about ~380MB uncompressed and ~100MB compressed.

zhanglistar commented 1 year ago

Also, disabling some third party libraries can reduce size, like hive or mysql etc.

auxten commented 1 year ago

Also, disabling some third party libraries can reduce size, like hive or mysql etc.

Thanks, I have disabled unnecessary libs for chdb. Hive and MySQL might be useful for chDB users.

auxten commented 8 months ago

@nmreadelf is working on that