ClickHouse / dbt-clickhouse

The Clickhouse plugin for dbt (data build tool)
Apache License 2.0
255 stars 113 forks source link

Not ADD ON CLUSTER when create __dbt_backup #205

Open ikeniborn opened 1 year ago

ikeniborn commented 1 year ago

Describe the bug

Steps to reproduce

1. 2. 3.

clickhouse: target: default outputs: default: driver: native type: clickhouse schema: default user: "{{ env_var('DBT_ENV_SECRET_USER') }}" password: "{{ env_var('DBT_ENV_SECRET_PASSWORD') }}"

optional fields

  port: 9000
  host: "{{ env_var('DBT_ENV_SECRET_HOST') }}"
  verify: False
  secure: False
  connect_timeout: 60
  # compression: 'gzip'
  threads: 8
  send_receive_timeout: 100000
  check_exchange: False
  cluster: "{cluster}"
  cluster_mode: False
  # Native (clickhouse-driver) connection settings
  sync_request_timeout: 5
  compress_block_size: 1048576
  use_lw_deletes: True
  custom_settings: 
    enable_optimize_predicate_expression: 1
    max_block_size: 65536
    max_insert_block_size: 2097152
    max_memory_usage: 130000000000
    max_bytes_before_external_group_by: 100000000000
    max_bytes_before_external_sort: 50000000000
    max_threads: 128
    max_insert_threads: 64
    max_query_size: 524288
    async_insert: 1
    async_insert_threads: 64

Expected behaviour

{{ config( enabled = true, schema = 'dimension', tags = ["dimension"], materialized = "table", engine = "MergeTree()", order_by = ("twitter_pinned_tweet_id"), ) }}

select pinned_tweet_id as twitter_pinned_tweet_id ,pinned_tweet_text as twitter_pinned_tweet_text -- ,vector(pinned_tweet_text) as twitter_pinned_tweet_text_vector ,pinned_tweet_created_at as twitter_pinned_tweet_created_at ,date_diff('day',pinned_tweet_created_at, now()) as twitter_pinned_tweet_day_old ,now() as updated_dttm from {{ref("raw_twitter_pinned_tweets")}} order by updated_dttm desc limit 1 by pinned_tweet_id

Code examples, such as models or profile settings

dbt and/or ClickHouse server logs

[0m12:50:07.068025 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.clickhouse.dim_twitter_pinned_tweet: /* {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse", "target_name": "****", "node_id": "model.clickhouse.dim_twitter_pinned_tweet"} /

drop table if exists dimension.dim_twitter_pinned_tweet__dbt_backup 

ON CLUSTER "{cluster}" ...

12:50:07.254117 [debug] [Thread-1 ]: dbt_clickhouse adapter: SQL status: OK in 0.19 seconds 12:50:07.338073 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.clickhouse.dim_twitter_pinned_tweet: /* {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse", "target_name": "****", "node_id": "model.clickhouse.dim_twitter_pinned_tweet"} /

    create table dimension.dim_twitter_pinned_tweet__dbt_backup

engine = MergeTree() order by (twitter_pinned_tweet_id)

    empty
as (

select pinned_tweet_id as twitter_pinned_tweet_id ,pinned_tweet_text as twitter_pinned_tweet_text -- ,vector(pinned_tweet_text) as twitter_pinned_tweet_text_vector ,pinned_tweet_created_at as twitter_pinned_tweet_created_at ,date_diff('day',pinned_tweet_created_at, now()) as twitter_pinned_tweet_day_old ,now() as updated_dttm from raw.raw_twitter_pinned_tweets order by updated_dttm desc limit 1 by pinned_tweet_id ) ... 12:50:07.429639 [debug] [Thread-1 ]: dbt_clickhouse adapter: SQL status: OK in 0.09 seconds 12:50:07.460572 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.clickhouse.dim_twitter_pinned_tweet: /* {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse", "target_name": "****", "node_id": "model.clickhouse.dim_twitter_pinned_tweet"} /

select name, type from system.columns where table = 'dim_twitter_pinned_tweet__dbt_backup'

  and database = 'dimension'

order by position

... 12:50:07.529162 [debug] [Thread-1 ]: dbt_clickhouse adapter: SQL status: OK in 0.07 seconds 12:50:07.549063 [debug] [Thread-1 ]: Writing runtime sql for node "model.clickhouse.dim_twitter_pinned_tweet" 12:50:07.550210 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.clickhouse.dim_twitter_pinned_tweet: /* {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse", "target_name": "****", "node_id": "model.clickhouse.dim_twitter_pinned_tweet"} /

    insert into dimension.dim_twitter_pinned_tweet__dbt_backup ("twitter_pinned_tweet_id", "twitter_pinned_tweet_text", "twitter_pinned_tweet_created_at", "twitter_pinned_tweet_day_old", "updated_dttm")

select pinned_tweet_id as twitter_pinned_tweet_id ,pinned_tweet_text as twitter_pinned_tweet_text -- ,vector(pinned_tweet_text) as twitter_pinned_tweet_text_vector ,pinned_tweet_created_at as twitter_pinned_tweet_created_at ,date_diff('day',pinned_tweet_created_at, now()) as twitter_pinned_tweet_day_old ,now() as updated_dttm from raw.raw_twitter_pinned_tweets order by updated_dttm desc limit 1 by pinned_tweet_id ... 12:50:07.686856 [debug] [Thread-1 ]: dbt_clickhouse adapter: SQL status: OK in 0.14 seconds 12:50:07.706430 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.clickhouse.dim_twitter_pinned_tweet: /* {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse", "target_name": "****", "node_id": "model.clickhouse.dim_twitter_pinned_tweet"} / EXCHANGE TABLES dimension.dim_twitter_pinned_tweet__dbt_backup AND dimension.dim_twitter_pinned_tweet

ON CLUSTER "{cluster}" 

... 12:50:07.902549 [debug] [Thread-1 ]: dbt_clickhouse adapter: Error running SQL: /* {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse", "target_name": "****", "node_id": "model.clickhouse.dim_twitter_pinned_tweet"} / EXCHANGE TABLES dimension.dim_twitter_pinned_tweet__dbt_backup AND dimension.dim_twitter_pinned_tweet

ON CLUSTER "{cluster}" 

12:50:07.903325 [debug] [Thread-1 ]: Timing info for model.clickhouse.dim_twitter_pinned_tweet (execute): 2023-11-08 12:50:06.967194 => 2023-11-08 12:50:07.903201 12:50:07.907147 [debug] [Thread-1 ]: Database Error in model dim_twitter_pinned_tweet (models/dimension/twitter/dim_twitter_pinned_tweet.sql) Code: 60. DB::Exception: There was an error on [10.10.1.217:9000]: Code: 60. DB::Exception: Table dimension.dim_twitter_pinned_tweet__dbt_backup doesn't exist. (UNKNOWN_TABLE) (version 23.10.1.1976 (official build)). Stack trace:

  1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000cdd11b7 in /usr/bin/clickhouse
  2. DB::DDLQueryStatusSource::generate() @ 0x0000000012bd0e75 in /usr/bin/clickhouse
  3. DB::ISource::tryGenerate() @ 0x0000000013afadf5 in /usr/bin/clickhouse
  4. DB::ISource::work() @ 0x0000000013afa927 in /usr/bin/clickhouse
  5. DB::ExecutionThreadContext::executeTask() @ 0x0000000013b123ba in /usr/bin/clickhouse
  6. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic*) @ 0x0000000013b09130 in /usr/bin/clickhouse
  7. DB::PipelineExecutor::execute(unsigned long, bool) @ 0x0000000013b083c0 in /usr/bin/clickhouse
  8. void std::function::policy_invoker<void ()>::__call_impl<std::function::****_alloc_func<ThreadFromGlobalPoolImpl::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::'lambda'(), void ()>>(std::function::policy_storage const) @ 0x0000000013b15d8f in /usr/bin/clickhouse
  9. void* std::thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::thread_struct, std::****_delete>, void ThreadPoolImpl::scheduleImpl(std::function<void ()>, Priority, std::optional, bool)::'lambda0'()>>(void) @ 0x000000000cebc6a7 in /usr/bin/clickhouse
  10. ? @ 0x00007f32a049f802 in ?
  11. ? @ 0x00007f32a043f450 in ? compiled Code at target/run/clickhouse/models/dimension/twitter/dim_twitter_pinned_tweet.sql

Configuration

Environment

ClickHouse server

genzgd commented 1 year ago

@gfunc @zli06160 -- any possibility one of you could take a look at this? Thanks!

gfunc commented 1 year ago

I will take a look. I think it is a problem with the exchange macro. It should not give theon cluster clause for table materialization.

Hi @ikeniborn, are you expecting the table to be created on cluster or not? If you are trying to use a clickhouse cluster, you should use distributed_table materialization.

gfunc commented 1 year ago

I can reproduce the problem now. It seems to be a compatibility issue.

model dimension.dim_twitter_pinned_tweet already exists on the cluster. But with the latest dbt-clickhouse ver 1.4.9 during the creation of the __dbt_backup table, there is no on cluster clause.

I suggest a full-refresh with distributed_table materialization if possible, meanwhile, I will prepare a fix.

gfunc commented 1 year ago

In #206 my solution to this problem is that we provide a detailed message to reflect the error and make full-refresh workable in this situation.

I am not sure this is the best way. so any comments, suggestions are welcomed.

ikeniborn commented 1 year ago

I will take a look. I think it is a problem with the exchange macro. It should not give theon cluster clause for table materialization.

Hi @ikeniborn, are you expecting the table to be created on cluster or not? If you are trying to use a clickhouse cluster, you should use distributed_table materialization.

Hi, @gfunc I reinstall dbt-clickhouse on 1.4.8 where all work without error. I have 1 shard and 2 replicas on cluster. I dont want used now distributed table. I want only create table on cluster. When claster will have more 1 shard i refactoring all models to dictributed_table i wait fix this error. Thank you.