Open ikeniborn opened 1 year ago
That is standard behavior for dbt parameters. You could call additional macros to initialize your parameter following the workaround described here, https://stackoverflow.com/questions/73363627/pass-a-macro-as-a-parameter-jinja-dbt, but that's a pretty ugly solution.
For your use case I think you should probably just add the additional predicate to the is_incremental()
clause in your model directly instead of trying to use the incremental_predicates function.
is_incremental() get data from dbt_new_data and when delete from target table filterinf from all table from full scan. If i used incremental_predicates i filtering target table from partition. It is more optimize.
with out incremental_predicates delete from fact.fct_token_data_agg_hour where (token_id, toYYYYMMDDhhmmss(toStartOfHour(dttm))) in (select token_id, toYYYYMMDDhhmmss(toStartOfHour(dttm)) from fact.fct_token_data_agg_hour__dbt_new_data)
with incremental_predicates delete from fact.fct_token_data_agg_hour where (token_id, toYYYYMMDDhhmmss(toStartOfHour(dttm))) in ( select token_id, YYYYMMDDhhmmss(toStartOfHour(dttm)) from fact.fct_token_data_agg_hour__dbt_new_data) and toYYYYMMDD(toStartOfMonth(dttm)) = toYYYYMMDD(toStartOfMonth(today())) --> this filter from fct_token_data_agg_hour
Okay, I understand -- can you use a CTE in the query and have the predicate reference the CTE alias?
No, that probably won't work either because the DELETE does have any access to the CTE. This could probably be implemented in the macro to evaluate the incremental_predicate expression before using it in the DELETE FROM. Reopening to investigate that.
i test my select with other data period and if i used date in unique_key that aiget filter partition when delete from target table. But if exclude date from unique_key i want filter partition from incremental_predicate. example with incremental_predicates delete from fact.fct_token_data_agg_hour where (token_id) in ( select token_id) from fact.fct_token_data_agg_hour__dbt_new_data) and toYYYYMMDD(toStartOfMonth(dttm)) = toYYYYMMDD(toStartOfMonth(today())) --> this filter from fct_token_data_agg_hour
@ikeniborn isn't it possible to pass all variables in config without wrapping in {{}}
? because you are already in jinja context. As far as understand get_incremental_period
is your custom macros? I can't test it. But I guess this code should work
incremental_strategy='delete+insert',
incremental_predicates=[
["toYYYYMMDD(toStartOfMonth(dttm)) in ", get_incremental_period(var("incremental_period"),var("incremental_shift"),"month")]|join("")
],
order_by =("token_id", "toYYYYMMDDhhmmss(toStartOfHour(dttm))"),
I also recommend reading this https://github.com/ClickHouse/ClickHouse/issues/39870 before using "lightweight" deletes
There have been some important updates to lightweight deletes since that issue was filed (although they are still experimental). Performance should be better, but any feedback we get from dbt users would be appreciated.
@ikeniborn isn't it possible to pass all variables in config without wrapping in
{{}}
? because you are already in jinja context. As far as understandget_incremental_period
is your custom macros? I can't test it. But I guess this code should workincremental_strategy='delete+insert', incremental_predicates=[ ["toYYYYMMDD(toStartOfMonth(dttm)) in ", get_incremental_period(var("incremental_period"),var("incremental_shift"),"month")]|join("") ], order_by =("token_id", "toYYYYMMDDhhmmss(toStartOfHour(dttm))"),
I test, It's working. Thanks.
About perfomance. delete more slowly then drop partition. About 3-5 slowly. For day i used partition, but when i want update hour a used delete, because i cant used partition from hour. I get limit for partition.
Describe the bug
I want use incremental_predicates and macros for dinamic period for filtering data for delete row. But i get error, because incremental_predicates not parse jinja template.
Expected behaviour
Code examples, such as models or profile settings
dbt and/or ClickHouse server logs
[0m17:10:42.587516 [debug] [Thread-1 ]: dbt_clickhouse adapter: Error running SQL: / {"app": "dbt", "dbt_version": "1.4.1", "profile_name": "clickhouse_local", "target_name": "default", "node_id": "model.clickhouse.fct_token_data_agg_hour"} /
[0m17:10:42.592055 [debug] [Thread-1 ]: Timing info for model.clickhouse.fct_token_data_agg_hour (execute): 2023-02-28 17:10:40.285091 => 2023-02-28 17:10:42.591825 [0m17:10:42.603606 [debug] [Thread-1 ]: Database Error in model fct_token_data_agg_hour (models/fact/token/fct_token_data_agg_hour.sql) Code: 62. DB::Exception: Syntax error: failed at position 479 ('{') (line 6, col 54): { get_incremental_period(var("incremental_period"),var("incremental_shift"),"month") }} ; . Expected substitution name (identifier). Stack trace:
Configuration
{{ config( enabled = true, schema = 'fact', tags = ["history_hour", "token"], materialized = "incremental", unique_key = ['token_id','toYYYYMMDDhhmmss(toStartOfHour(dttm))'], incremental_strategy='delete+insert', incremental_predicates=[ "toYYYYMMDD(toStartOfMonth(dttm)) in {{ get_incremental_period(var(\"incremental_period\"),var(\"incremental_shift\"),\"month\") }}" ], order_by =("token_id", "toYYYYMMDDhhmmss(toStartOfHour(dttm))"), engine = "MergeTree()", partition_by = "toYYYYMMDD(toStartOfMonth(dttm))", ) }}
Environment
ClickHouse server
CREATE TABLE
statements for tables involved: