Closed GonnaMakeYouCry closed 11 months ago
@GonnaMakeYouCry do you use cluster or single server setup? I tried to reproduce with single server and it works fine for me.
According to code, *__dbt_tmp
created only with distributed_table
materialization. Maybe you can share your full debug log from dbt. dbt -d run -s yourmodel
to understand why it is not working for you
@simpl1g
We use Yandex Cloud CH Managed Service
And the log after using command dbt -d run -s yourmodel
============================== 2023-11-20 07:16:20.020996 | e3382018-9d72-4754-870d-b3f1ffe168d1 ============================== 07:16:20.020996 [info ] [MainThread]: Running with dbt=1.4.9 07:16:20.023435 [debug] [MainThread]: running dbt with arguments {'debug': True, 'write_json': True, 'use_colors': True, 'printer_width': 80, 'version_check': True, 'partial_parse': True, 'static_parser': True, 'profiles_dir': '/home/airflow/.dbt', 'send_anonymous_usage_stats': True, 'quiet': False, 'no_print': False, 'cache_selected_only': False, 'select': ['./models/core/actual_project_data.sql'], 'which': 'run', 'rpc_method': 'run', 'indirect_selection': 'eager'} 07:16:20.024209 [debug] [MainThread]: Tracking: tracking 07:16:20.025864 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fdaedeb7b10>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fdaedeb7e50>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fdaedeb7e10>]} 07:16:20.046441 [debug] [MainThread]: checksum: 07ac7676dbf91b944d5fe41bac352fb2dda62b33015cbd1e52faa2a2c71f6d73, vars: {}, profile: None, target: None, version: 1.4.9 07:16:20.081888 [debug] [MainThread]: Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed. 07:16:20.082698 [debug] [MainThread]: Partial parsing enabled, no changes found, skipping parsing 07:16:20.084464 [warn ] [MainThread]: [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources. There are 1 unused configuration paths:
models.iyno_data_marts.example 07:16:20.096085 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'load_project', 'label': 'e3382018-9d72-4754-870d-b3f1ffe168d1', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fdaec137450>]} 07:16:20.107754 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': 'e3382018-9d72-4754-870d-b3f1ffe168d1', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fdaede88a50>]} 07:16:20.108687 [info ] [MainThread]: Found 3 models, 4 tests, 0 snapshots, 0 analyses, 327 macros, 0 operations, 0 seed files, 11 sources, 0 exposures, 0 metrics 07:16:20.109326 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'e3382018-9d72-4754-870d-b3f1ffe168d1', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fdaede88f90>]} 07:16:20.112247 [info ] [MainThread]: 07:16:20.114632 [debug] [MainThread]: Acquiring new clickhouse connection 'master' 07:16:20.116698 [debug] [ThreadPool]: Acquiring new clickhouse connection 'list_schemas' 07:16:20.134418 [debug] [ThreadPool]: Opening a new connection, currently in state init 07:16:20.673483 [debug] [ThreadPool]: dbt_clickhouse adapter: On list_schemas: / {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse_beta", "target_name": "beta", "connection_name": "list_schemas"} /
select name from system.databases ... 07:16:20.676928 [debug] [ThreadPool]: dbt_clickhouse adapter: SQL status: OK in 0.00 seconds 07:16:20.682062 [debug] [ThreadPool]: Acquiring new clickhouse connection 'listNone' 07:16:20.682941 [debug] [ThreadPool]: Re-using an available connection from the pool (formerly listNone) 07:16:20.694269 [debug] [ThreadPool]: dbt_clickhouse adapter: On listNone: / {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse_beta", "target_name": "beta", "connection_name": "listNone"} / select t.name as name, t.database as schema, if(engine not in ('MaterializedView', 'View'), 'table', 'view') as type, db.engine as db_engine,0 as is_on_cluster from system.tables as t join system.databases as db on t.database = db.name where schema = ''
... 07:16:20.699395 [debug] [ThreadPool]: dbt_clickhouse adapter: SQL status: OK in 0.00 seconds 07:16:20.702432 [debug] [ThreadPool]: Acquiring new clickhouse connection 'list_None_stage' 07:16:20.703303 [debug] [ThreadPool]: Re-using an available connection from the pool (formerly list_None_stage) 07:16:20.706970 [debug] [ThreadPool]: dbt_clickhouse adapter: On list_None_stage: / {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse_beta", "target_name": "beta", "connection_name": "list_None_stage"} / select t.name as name, t.database as schema, if(engine not in ('MaterializedView', 'View'), 'table', 'view') as type, db.engine as db_engine,0 as is_on_cluster from system.tables as t join system.databases as db on t.database = db.name where schema = 'stage'
... 07:16:20.713810 [debug] [ThreadPool]: dbt_clickhouse adapter: SQL status: OK in 0.01 seconds 07:16:20.716903 [debug] [ThreadPool]: Acquiring new clickhouse connection 'list_None_core' 07:16:20.718790 [debug] [ThreadPool]: Re-using an available connection from the pool (formerly list_None_core) 07:16:20.722207 [debug] [ThreadPool]: dbt_clickhouse adapter: On list_None_core: / {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse_beta", "target_name": "beta", "connection_name": "list_None_core"} / select t.name as name, t.database as schema, if(engine not in ('MaterializedView', 'View'), 'table', 'view') as type, db.engine as db_engine,0 as is_on_cluster from system.tables as t join system.databases as db on t.database = db.name where schema = 'core'
... 07:16:20.727107 [debug] [ThreadPool]: dbt_clickhouse adapter: SQL status: OK in 0.00 seconds 07:16:20.730361 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'e3382018-9d72-4754-870d-b3f1ffe168d1', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fdaec336190>]} 07:16:20.731707 [info ] [MainThread]: Concurrency: 1 threads (target='beta') 07:16:20.732359 [info ] [MainThread]: 07:16:20.739489 [debug] [Thread-1 ]: Began running node model.iyno_data_marts.actual_project_data 07:16:20.740364 [info ] [Thread-1 ]: 1 of 1 START sql incremental model core.actual_project_data .................... [RUN] 07:16:20.741440 [debug] [Thread-1 ]: Acquiring new clickhouse connection 'model.iyno_data_marts.actual_project_data' 07:16:20.742012 [debug] [Thread-1 ]: Re-using an available connection from the pool (formerly model.iyno_data_marts.actual_project_data) 07:16:20.742719 [debug] [Thread-1 ]: Began compiling node model.iyno_data_marts.actual_project_data 07:16:20.772348 [debug] [Thread-1 ]: Writing injected SQL for node "model.iyno_data_marts.actual_project_data" 07:16:20.773405 [debug] [Thread-1 ]: Timing info for model.iyno_data_marts.actual_project_data (compile): 2023-11-20 07:16:20.743301 => 2023-11-20 07:16:20.773289 07:16:20.774168 [debug] [Thread-1 ]: Began executing node model.iyno_data_marts.actual_project_data 07:16:20.853308 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.iyno_data_marts.actual_project_data: / {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse_beta", "target_name": "beta", "node_id": "model.iyno_data_marts.actual_project_data"} /
drop table if exists core.actual_project_data__dbt_tmp ... 07:16:20.858216 [debug] [Thread-1 ]: dbt_clickhouse adapter: SQL status: OK in 0.00 seconds 07:16:20.889448 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.iyno_data_marts.actual_project_data: / {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse_beta", "target_name": "beta", "node_id": "model.iyno_data_marts.actual_project_data"} /
drop table if exists core.actual_project_data__dbt_new_data ... 07:16:20.893359 [debug] [Thread-1 ]: dbt_clickhouse adapter: SQL status: OK in 0.00 seconds 07:16:20.933993 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.iyno_data_marts.actual_project_data: / {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse_beta", "target_name": "beta", "node_id": "model.iyno_data_marts.actual_project_data"} /
create table core.actual_project_data__dbt_new_data
engine = MergeTree() order by (projectId)
partition by (projectId)
SETTINGS allow_nullable_key=1
empty
as (
WITH project_active_periods AS
(
SELECT
pg_prod_repl.element_periods.projectId as "projectId"
,pg_prod_repl.element_periods.id as "periodId"
FROM core.projects_in_work
JOIN pg_prod_repl.projects
ON core.projects_in_work.projectId = pg_prod_repl.projects.id
JOIN pg_prod_repl.element_periods
ON core.projects_in_work.projectId = pg_prod_repl.element_periods.projectId
WHERE True
AND pg_prod_repl.element_periods.activeFrom < NOW()
AND pg_prod_repl.element_periods.activeTo IS NULL
AND pg_prod_repl.projects.deleted = 0
),
project_active_models AS
(
SELECT
core.projects_in_work.projectId AS "projectId"
,pg_prod_repl.models.id AS "modelId"
FROM core.projects_in_work
JOIN pg_prod_repl.projects
ON core.projects_in_work.projectId = pg_prod_repl.projects.id
JOIN pg_prod_repl.models
ON core.projects_in_work.projectId = pg_prod_repl.models.projectId
JOIN pg_prod_repl.model_pdata
ON pg_prod_repl.models.id = pg_prod_repl.model_pdata.modelId
JOIN pg_prod_repl.model_periods
ON pg_prod_repl.model_pdata.periodId = pg_prod_repl.model_periods.id
WHERE True
AND pg_prod_repl.model_periods.activeTo IS NULL
AND pg_prod_repl.model_periods.activeFrom < NOW()
AND pg_prod_repl.projects.deleted = 0
),
actual_project_data AS
(
SELECT
project_active_periods.projectId AS projectId
,project_active_periods.periodId AS periodId
,project_active_models.modelId AS modelId
-- assumeNotNull(project_active_periods.projectId) AS projectId
-- ,assumeNotNull(project_active_periods.periodId) AS periodId
-- ,assumeNotNull(project_active_models.modelId) AS modelId
,date_add(HOUR, 3, NOW()) AS "SYS_MOMENT"
FROM project_active_periods
JOIN project_active_models
ON project_active_periods."projectId" = project_active_models."projectId"
)
SELECT * FROM actual_project_data
-- this filter will only be applied on an incremental run where modelId NOT IN (select modelId from core.actual_project_data)
)
...
07:16:21.232101 [debug] [Thread-1 ]: dbt_clickhouse adapter: SQL status: OK in 0.30 seconds 07:16:21.245551 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.iyno_data_marts.actual_project_data: / {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse_beta", "target_name": "beta", "node_id": "model.iyno_data_marts.actual_project_data"} /
select name, type from system.columns where table = 'actual_project_data__dbt_new_data'
and database = 'core'
order by position
... 07:16:21.249791 [debug] [Thread-1 ]: dbt_clickhouse adapter: SQL status: OK in 0.00 seconds 07:16:21.255539 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.iyno_data_marts.actual_project_data: / {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse_beta", "target_name": "beta", "node_id": "model.iyno_data_marts.actual_project_data"} /
insert into core.actual_project_data__dbt_new_data ("projectId", "periodId", "modelId", "SYS_MOMENT")
WITH project_active_periods AS
(
SELECT
pg_prod_repl.element_periods.projectId as "projectId"
,pg_prod_repl.element_periods.id as "periodId"
FROM core.projects_in_work
JOIN pg_prod_repl.projects
ON core.projects_in_work.projectId = pg_prod_repl.projects.id
JOIN pg_prod_repl.element_periods
ON core.projects_in_work.projectId = pg_prod_repl.element_periods.projectId
WHERE True
AND pg_prod_repl.element_periods.activeFrom < NOW()
AND pg_prod_repl.element_periods.activeTo IS NULL
AND pg_prod_repl.projects.deleted = 0
),
project_active_models AS
(
SELECT
core.projects_in_work.projectId AS "projectId"
,pg_prod_repl.models.id AS "modelId"
FROM core.projects_in_work
JOIN pg_prod_repl.projects
ON core.projects_in_work.projectId = pg_prod_repl.projects.id
JOIN pg_prod_repl.models
ON core.projects_in_work.projectId = pg_prod_repl.models.projectId
JOIN pg_prod_repl.model_pdata
ON pg_prod_repl.models.id = pg_prod_repl.model_pdata.modelId
JOIN pg_prod_repl.model_periods
ON pg_prod_repl.model_pdata.periodId = pg_prod_repl.model_periods.id
WHERE True
AND pg_prod_repl.model_periods.activeTo IS NULL
AND pg_prod_repl.model_periods.activeFrom < NOW()
AND pg_prod_repl.projects.deleted = 0
),
actual_project_data AS
(
SELECT
project_active_periods.projectId AS projectId
,project_active_periods.periodId AS periodId
,project_active_models.modelId AS modelId
-- assumeNotNull(project_active_periods.projectId) AS projectId
-- ,assumeNotNull(project_active_periods.periodId) AS periodId
-- ,assumeNotNull(project_active_models.modelId) AS modelId
,date_add(HOUR, 3, NOW()) AS "SYS_MOMENT"
FROM project_active_periods
JOIN project_active_models
ON project_active_periods."projectId" = project_active_models."projectId"
)
SELECT * FROM actual_project_data
-- this filter will only be applied on an incremental run where modelId NOT IN (select modelId from core.actual_project_data)
...
07:16:22.074887 [debug] [Thread-1 ]: dbt_clickhouse adapter: SQL status: OK in 0.82 seconds 07:16:22.077426 [debug] [Thread-1 ]: Writing runtime sql for node "model.iyno_data_marts.actual_project_data" 07:16:22.078670 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.iyno_data_marts.actual_project_data: / {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse_beta", "target_name": "beta", "node_id": "model.iyno_data_marts.actual_project_data"} /
create table core.actual_project_data__dbt_tmp as core.actual_project_data__dbt_new_data
...
07:16:22.097520 [debug] [Thread-1 ]: dbt_clickhouse adapter: SQL status: OK in 0.02 seconds 07:16:22.102323 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.iyno_data_marts.actual_project_data: / {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse_beta", "target_name": "beta", "node_id": "model.iyno_data_marts.actual_project_data"} /
select name, type from system.columns where table = 'actual_project_data'
and database = 'core'
order by position
... 07:16:22.106706 [debug] [Thread-1 ]: dbt_clickhouse adapter: SQL status: OK in 0.00 seconds 07:16:22.109486 [debug] [Thread-1 ]: dbt_clickhouse adapter: On model.iyno_data_marts.actual_project_data: / {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse_beta", "target_name": "beta", "node_id": "model.iyno_data_marts.actual_project_data"} /
insert into core.actual_project_data__dbt_tmp ("projectId", "periodId", "modelId", "SYS_MOMENT")
select "projectId", "periodId", "modelId", "SYS_MOMENT"
from core.actual_project_data
where (modelId) not in (
select modelId
from core.actual_project_data__dbt_new_data
)
SETTINGS allow_nullable_key=1
...
07:16:22.113751 [debug] [Thread-1 ]: dbt_clickhouse adapter: Error running SQL: / {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse_beta", "target_name": "beta", "node_id": "model.iyno_data_marts.actual_project_data"} /
insert into core.actual_project_data__dbt_tmp ("projectId", "periodId", "modelId", "SYS_MOMENT")
select "projectId", "periodId", "modelId", "SYS_MOMENT"
from core.actual_project_data
where (modelId) not in (
select modelId
from core.actual_project_data__dbt_new_data
)
SETTINGS allow_nullable_key=1
07:16:22.114606 [debug] [Thread-1 ]: Timing info for model.iyno_data_marts.actual_project_data (execute): 2023-11-20 07:16:20.774855 => 2023-11-20 07:16:22.114509 07:16:22.117265 [debug] [Thread-1 ]: Database Error in model actual_project_data (models/core/actual_project_data.sql) :HTTPDriver for http://65vl5ksg.mdb.yandexcloud.net:8123 returned response code 404) Code: 115. DB::Exception: Setting allow_nullablekey is neither a builtin setting nor started with the prefix 'custom' registered for user-defined settings. (UNKNOWN_SETTING) (version 23.3.15.29 (official build)) compiled Code at target/run/iyno_data_marts/models/core/actual_project_data.sql 07:16:22.118169 [debug] [Thread-1 ]: Sending event: {'category': 'dbt', 'action': 'run_model', 'label': 'e3382018-9d72-4754-870d-b3f1ffe168d1', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fdacf762c50>]} 07:16:22.119424 [error] [Thread-1 ]: 1 of 1 ERROR creating sql incremental model core.actual_project_data ........... [ERROR in 1.38s] 07:16:22.122063 [debug] [Thread-1 ]: Finished running node model.iyno_data_marts.actual_project_data 07:16:22.149095 [debug] [MainThread]: Acquiring new clickhouse connection 'master' 07:16:22.150340 [debug] [MainThread]: Connection 'master' was properly closed. 07:16:22.151008 [debug] [MainThread]: Connection 'model.iyno_data_marts.actual_project_data' was left open. 07:16:22.151685 [debug] [MainThread]: On model.iyno_data_marts.actual_project_data: Close 07:16:22.152411 [info ] [MainThread]: 07:16:22.153139 [info ] [MainThread]: Finished running 1 incremental model in 0 hours 0 minutes and 2.04 seconds (2.04s). 07:16:22.153897 [debug] [MainThread]: Command end result 07:16:22.222691 [info ] [MainThread]: 07:16:22.223473 [info ] [MainThread]: Completed with 1 error and 0 warnings: 07:16:22.224136 [info ] [MainThread]: 07:16:22.224747 [error] [MainThread]: Database Error in model actual_project_data (models/core/actual_project_data.sql) 07:16:22.225336 [error] [MainThread]: :HTTPDriver for http://5vl5ksg.mdb.yandexcloud.net:8123 returned response code 404) 07:16:22.225902 [error] [MainThread]: Code: 115. DB::Exception: Setting allow_nullablekey is neither a builtin setting nor started with the prefix 'custom' registered for user-defined settings. (UNKNOWN_SETTING) (version 23.3.15.29 (official build)) 07:16:22.226560 [error] [MainThread]: compiled Code at target/run/iyno_data_marts/models/core/actual_project_data.sql 07:16:22.227209 [info ] [MainThread]: 07:16:22.227799 [info ] [MainThread]: Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 07:16:22.228522 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fdacf872450>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fdaeb2127d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fdacf8d5e50>]} 07:16:22.229202 [debug] [MainThread]: Flushing usage events
I believe this is a bug with the "legacy" incremental strategy where settings that are supposed to be DDL "Table" settings are used incorrectly in the INSERT statements. I believe this will be fixed in next upcoming patch release.
Describe the bug
Incorrect behavior of the model when using the setting allow_nullable_key
Steps to reproduce
My dbt model create ETL process by reading the data from PostgresSQL (I use PostgreSQL Engine) and load the data into Clickhouse. Data Source table in Postgres has a Nullable setting for columns which I use as uniqueKey and partiotionKey (In fact, there are no Null values there)
Expected behaviour
Re-running the model should not lead to errors
Code examples, such as models or profile settings
Model conf {{ config( order_by='projectId', engine='MergeTree()', partition_by='projectId', materialized='incremental', settings = {'allow_nullable_key': 1}, unique_key='modelId', schema='core') }}
dbt and/or ClickHouse server logs
I used this query to watch where the error happened: select query from system.query_log where exception like '%115. DB::Exception%'
And query result:
/ {"app": "dbt", "dbt_version": "1.4.9", "profile_name": "clickhouse_beta", "target_name": "beta", "node_id": "model.iyno_data_marts.actual_project_data"} /
Configuration
Core:
latest: 1.7.1
Your version of dbt-core is out of date! You can find instructions for upgrading here: https://docs.getdbt.com/docs/installation
Plugins:
ClickHouse server