ClickHouse / dbt-clickhouse

The Clickhouse plugin for dbt (data build tool)
Apache License 2.0
255 stars 113 forks source link

Model query_settings not honored for incremental models with contracts #277

Closed rjoelnorgren closed 6 months ago

rjoelnorgren commented 7 months ago

Describe the bug

When contracts are configured for a model, the clickhouse__insert_into macro does not interpolate query_settings configured for that model.

Steps to reproduce

  1. Create model with contracts
  2. Configure query_settings on the model
  3. Run the mode
  4. Observe the configured settings are missing from system.query_log.Settings for the query inserting into the model.

Expected behaviour

The query settings are used for the model.

BentsiLeviav commented 6 months ago

Hi @rjoelnorgren

Thanks for opening this issue. I actually managed to see the settings when following your steps to reproduce instructions. This is my setup:

schema.yml file:

version: 2

models:
  - name: test
    description: test issue 227 + cr 228
    config:
      materialized: incremental
      query_settings: {"allow_experimental_analyzer":0}
      order_by: rand_trip_id
      unique_key: rand_trip_id
      on_schema_change: fail
      contract:
        enforced: true
    columns:
      - name: rand_trip_id
        data_type: UInt8
      - name: key2
        data_type: string

test.sql (the model):

select 1 as rand_trip_id, 'test' as key2

The generated insert into query:

     insert into `default`.`test__dbt_tmp` ("rand_trip_id", "key2")
        select "rand_trip_id", "key2"
        from `default`.`test__dbt_new_data`

            -- settings_section
            SETTINGS  allow_experimental_analyzer=0

After running this, I queried system.query_log using this query:

select Settings from system.query_log
         where query like '%dbt%' and query_kind = 'Insert' and query like '%allow_experimental_analyzer%'
order by query_start_time desc

And got the following line:

{'distributed_foreground_insert': '1', 'http_zlib_compression_level': '3', 'send_progress_in_http_headers': '1', 'http_headers_progress_interval_ms': '120000', 'join_use_nulls': '1', 'allow_experimental_analyzer': '0', 'mutations_sync': '2', 'allow_nondeterministic_mutations': '1'}

My ClickHouse version is: 23.10.6.60 My ClickHouse DBT version is: 1.7.11 Python version: 3.9.6

Could you provide more information regarding your setup and scenario? Because for the described setup - everything works well.

rjoelnorgren commented 6 months ago

@BentsiLeviav thank you for taking the time to review the issue.

Could you possibly review the query string itself from your case to confirm which statements the config.query_settings are being applied to? From your example what I noticed was that the settings were only included in the query which combined the new + old data, but not the query which generated the new data. See reproduction below.

What I found is that in clickhouse__insert_into, adapter.get_model_query_settings is only called in the else case where has_contract is false. In https://github.com/ClickHouse/dbt-clickhouse/issues/277 I address that by move the call outside of the if block to ensure its always part of the query.

SELECT
    event_time,
    query_id,
    query,
    Settings
FROM system.query_log
WHERE (query LIKE '%"node_id": "model%test"%') AND (event_time > (now() - toIntervalMinute(120))) AND (query_kind = 'Insert') AND (type = 'QueryFinish')
ORDER BY event_time DESC

Adds new data to test__dbt_new_data, allow_experimental_analyzer is unset.

event_time: 2024-05-13 15:35:16
query_id:   0d33fa12-aaf6-41a6-93c0-f9bfba472518
query:      /* {"app": "dbt", "dbt_version": "1.7.12", "profile_name": "jaffle_shop", "target_name": "dev", "node_id": "model.jaffle_shop.test"} */

        insert into `jaffle_shop`.`test__dbt_new_data`
        ("rand_trip_id", "key2")-- Use a subquery to get columns in the right order
          SELECT "rand_trip_id", "key2" FROM ( select 1 as rand_trip_id, 'test' as key2 )

Settings:   {'distributed_foreground_insert':'1','mutations_sync':'2','allow_nondeterministic_mutations':'1'}

Adds old data to test__dbt_tmp, allow_experimental_analyzer = 0

event_time: 2024-05-13 15:35:16
query_id:   81e35dfb-ffc3-4beb-bf69-01c2dcae703d
query:      /* {"app": "dbt", "dbt_version": "1.7.12", "profile_name": "jaffle_shop", "target_name": "dev", "node_id": "model.jaffle_shop.test"} */

        insert into `jaffle_shop`.`test__dbt_tmp` ("rand_trip_id", "key2")
        select "rand_trip_id", "key2"
        from `jaffle_shop`.`test`
          where (rand_trip_id) not in (
            select rand_trip_id
            from `jaffle_shop`.`test__dbt_new_data`
          )

            -- settings_section
            SETTINGS  allow_experimental_analyzer=0

Settings:   {'distributed_foreground_insert':'1','allow_experimental_analyzer':'0','mutations_sync':'2','allow_nondeterministic_mutations':'1'}

Adds new data to test_dbt_tmp, allow_experimental_analyzer = 0

event_time: 2024-05-13 15:35:16
query_id:   e867dd57-99f3-4beb-9014-da7dc23b593b
query:      /* {"app": "dbt", "dbt_version": "1.7.12", "profile_name": "jaffle_shop", "target_name": "dev", "node_id": "model.jaffle_shop.test"} */

     insert into `jaffle_shop`.`test__dbt_tmp` ("rand_trip_id", "key2")
        select "rand_trip_id", "key2"
        from `jaffle_shop`.`test__dbt_new_data`

            -- settings_section
            SETTINGS  allow_experimental_analyzer=0

Settings:   {'distributed_foreground_insert':'1','allow_experimental_analyzer':'0','mutations_sync':'2','allow_nondeterministic_mutations':'1'}
rjoelnorgren commented 6 months ago

@BentsiLeviav, sorry to follow up this is with

dbt-clickhouse==1.7.5 dbt-core==1.7.12

and clickhouse server==24.3.1