Closed rjoelnorgren closed 6 months ago
Hi @rjoelnorgren
Thanks for opening this issue. I actually managed to see the settings when following your steps to reproduce instructions. This is my setup:
schema.yml file:
version: 2
models:
- name: test
description: test issue 227 + cr 228
config:
materialized: incremental
query_settings: {"allow_experimental_analyzer":0}
order_by: rand_trip_id
unique_key: rand_trip_id
on_schema_change: fail
contract:
enforced: true
columns:
- name: rand_trip_id
data_type: UInt8
- name: key2
data_type: string
test.sql (the model):
select 1 as rand_trip_id, 'test' as key2
The generated insert into query:
insert into `default`.`test__dbt_tmp` ("rand_trip_id", "key2")
select "rand_trip_id", "key2"
from `default`.`test__dbt_new_data`
-- settings_section
SETTINGS allow_experimental_analyzer=0
After running this, I queried system.query_log
using this query:
select Settings from system.query_log
where query like '%dbt%' and query_kind = 'Insert' and query like '%allow_experimental_analyzer%'
order by query_start_time desc
And got the following line:
{'distributed_foreground_insert': '1', 'http_zlib_compression_level': '3', 'send_progress_in_http_headers': '1', 'http_headers_progress_interval_ms': '120000', 'join_use_nulls': '1', 'allow_experimental_analyzer': '0', 'mutations_sync': '2', 'allow_nondeterministic_mutations': '1'}
My ClickHouse version is: 23.10.6.60
My ClickHouse DBT version is: 1.7.11
Python version: 3.9.6
Could you provide more information regarding your setup and scenario? Because for the described setup - everything works well.
@BentsiLeviav thank you for taking the time to review the issue.
Could you possibly review the query string itself from your case to confirm which statements the config.query_settings are being applied to? From your example what I noticed was that the settings were only included in the query which combined the new + old data, but not the query which generated the new data. See reproduction below.
What I found is that in clickhouse__insert_into, adapter.get_model_query_settings is only called in the else case where has_contract is false. In https://github.com/ClickHouse/dbt-clickhouse/issues/277 I address that by move the call outside of the if block to ensure its always part of the query.
SELECT
event_time,
query_id,
query,
Settings
FROM system.query_log
WHERE (query LIKE '%"node_id": "model%test"%') AND (event_time > (now() - toIntervalMinute(120))) AND (query_kind = 'Insert') AND (type = 'QueryFinish')
ORDER BY event_time DESC
Adds new data to test__dbt_new_data, allow_experimental_analyzer is unset.
event_time: 2024-05-13 15:35:16
query_id: 0d33fa12-aaf6-41a6-93c0-f9bfba472518
query: /* {"app": "dbt", "dbt_version": "1.7.12", "profile_name": "jaffle_shop", "target_name": "dev", "node_id": "model.jaffle_shop.test"} */
insert into `jaffle_shop`.`test__dbt_new_data`
("rand_trip_id", "key2")-- Use a subquery to get columns in the right order
SELECT "rand_trip_id", "key2" FROM ( select 1 as rand_trip_id, 'test' as key2 )
Settings: {'distributed_foreground_insert':'1','mutations_sync':'2','allow_nondeterministic_mutations':'1'}
Adds old data to test__dbt_tmp, allow_experimental_analyzer = 0
event_time: 2024-05-13 15:35:16
query_id: 81e35dfb-ffc3-4beb-bf69-01c2dcae703d
query: /* {"app": "dbt", "dbt_version": "1.7.12", "profile_name": "jaffle_shop", "target_name": "dev", "node_id": "model.jaffle_shop.test"} */
insert into `jaffle_shop`.`test__dbt_tmp` ("rand_trip_id", "key2")
select "rand_trip_id", "key2"
from `jaffle_shop`.`test`
where (rand_trip_id) not in (
select rand_trip_id
from `jaffle_shop`.`test__dbt_new_data`
)
-- settings_section
SETTINGS allow_experimental_analyzer=0
Settings: {'distributed_foreground_insert':'1','allow_experimental_analyzer':'0','mutations_sync':'2','allow_nondeterministic_mutations':'1'}
Adds new data to test_dbt_tmp, allow_experimental_analyzer = 0
event_time: 2024-05-13 15:35:16
query_id: e867dd57-99f3-4beb-9014-da7dc23b593b
query: /* {"app": "dbt", "dbt_version": "1.7.12", "profile_name": "jaffle_shop", "target_name": "dev", "node_id": "model.jaffle_shop.test"} */
insert into `jaffle_shop`.`test__dbt_tmp` ("rand_trip_id", "key2")
select "rand_trip_id", "key2"
from `jaffle_shop`.`test__dbt_new_data`
-- settings_section
SETTINGS allow_experimental_analyzer=0
Settings: {'distributed_foreground_insert':'1','allow_experimental_analyzer':'0','mutations_sync':'2','allow_nondeterministic_mutations':'1'}
@BentsiLeviav, sorry to follow up this is with
dbt-clickhouse==1.7.5 dbt-core==1.7.12
and clickhouse server==24.3.1
Describe the bug
When contracts are configured for a model, the clickhouse__insert_into macro does not interpolate query_settings configured for that model.
Steps to reproduce
Expected behaviour
The query settings are used for the model.