ClickHouse / dbt-clickhouse

The Clickhouse plugin for dbt (data build tool)
Apache License 2.0
253 stars 113 forks source link

Multiple materialized views on the same target distributed_table with a config possible? #320

Open rightx2 opened 4 months ago

rightx2 commented 4 months ago

Is your feature request related to a problem? Please describe. When I defiend materialized_view like below,

{{ config(
       materialized='materialized_view',
       order_by=['stock_code', 'second'],
       sharding_key='cityHash64(stock_code)',
       engine='ReplicatedAggregatingMergeTree()',
       settings = { 'index_granularity': 8192, 'allow_nullable_key': 1 },
       ttl='toDateTime(second) + toIntervalHour(1)'
}}

SELECT
    stock_code,
    toStartOfSecond(__ts_ms) AS second,
    countState(*) AS pv
FROM {{ source("mydb", "stocks") }}
GROUP BY
    stock_code,
    second

I understood it created agg_pv table and agg_pv_mv view and I can query like below

SELECT stock_code, countMerge(pv) FROM analytics_dbt_dev.agg_pv GROUP BY 1 ORDER BY 2 desc

but I'd like to do more:

  1. I'd like to create target materialized table as distributed table. If it worked, agg_pv, agg_pv_local would be created. Related issue is https://github.com/ClickHouse/dbt-clickhouse/issues/302 and it closed as completed, but I can't fully understand how the answer work:
    
    {{ config(materialized='view') }}

SELECT * FROM clusterAllReplicas('cluster_name', {{ ref('local_table') }}) SETTINGS final = 1

Moreover, official docs said, it's kinda inefficient..

> Using the cluster and clusterAllReplicas table functions are less efficient than creating a Distributed table because in this case, the server connection is re-established for every request. When processing a large number of queries, please always create the Distributed table ahead of time, and do not use the cluster and clusterAllReplicas table functions.

I wonder whether I can implement it with one model sql file with `config` first, but if can't, I'd like to know any other ways as well.

2. I'd like to set two materialized views, from differenet source, and targeting on the same table like below:

A --> target_t_mv --> target_t (distributed) B --> target_t_mv --> target_t (distributed)



I wonder whether I can do this with `config`. If can't, I'd like to know any other ways as well.

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
benjamin-awd commented 3 months ago

AFAIK there isn't a way to do this in dbt-clickhouse yet -- I was thinking that one solution could be to add some kind of flag in the config like create_target_table, so that only one of the MVs will create the target table

Other relevant issue: https://github.com/ClickHouse/dbt-clickhouse/issues/280

the4thamigo-uk commented 1 month ago

I was exploring this today, I've come up with a proposal which I think is fairly easy to implement. I will hopefully make a PR tomorrow to demonstrate it clearly. I would appreciate your thoughts on this idea. #280