ClickHouse / dbt-clickhouse

The Clickhouse plugin for dbt (data build tool)
Apache License 2.0
242 stars 98 forks source link

Union all issues #75

Open royxact opened 2 years ago

royxact commented 2 years ago

Hi, We're using DBT and clickhouse. Trouble begins when using UNION ALL , which is a known clickhouse issue: https://github.com/ClickHouse/ClickHouse/issues/30532 As stated in the above it's a very common pattern for BI/DBT. DBT has a special union macro (https://github.com/dbt-labs/dbt-utils/blob/main/macros/sql/union.sql) which is very useful. Bottom line, would be very useful to have dbt-clickhouse Materialize a table that uses this macro as a series of INSERTs of each unionized relations, rather than a single CREATE TABLE as SELECT.

guykoh commented 2 years ago

Thanks @royxact, did you try changing max_threads to limit the degree of parallelism?

Anyway, we plan to start working on dbt utils and I'm pretty sure we could add that feature to the union macro too.

royxact commented 2 years ago

Hi @guykoh ,

Thank you very much for your suggestion (: We've tried max_threads=1 as a workaround, and that keeps the memory usage under control. But - performance would be limited to a single thread. Sure we could try to tune it, but in the end every relation added to UNION ALL still consumes some memory (see https://github.com/ClickHouse/ClickHouse/issues/23245). Maybe it's better to solve https://github.com/ClickHouse/ClickHouse/issues/30532 and https://github.com/ClickHouse/ClickHouse/issues/23245 , rather than fixing the DBT macro. But, I think reaching a point where a CREATE TABLE as SELECT of complex generated queries, with multiple JOINS plus UNION ALL, works perfectly without the need to spend time tuning it for parallelism and join types - might take some more time. That's the reason I think it's worth investing in improving the macro, to make clickhouse very useful with DBT, with "no brainer" scalability, right now in the short term. Thanks!