ClickHouse / dbt-clickhouse

The Clickhouse plugin for dbt (data build tool)
Apache License 2.0
236 stars 95 forks source link

incremental_predicates not parse jinja template #143

Open ikeniborn opened 1 year ago

ikeniborn commented 1 year ago

Describe the bug

I want use incremental_predicates and macros for dinamic period for filtering data for delete row. But i get error, because incremental_predicates not parse jinja template.

Expected behaviour

Code examples, such as models or profile settings

dbt and/or ClickHouse server logs

17:10:42.587516 [debug] [Thread-1 ]: dbt_clickhouse adapter: Error running SQL: / {"app": "dbt", "dbt_version": "1.4.1", "profile_name": "clickhouse_local", "target_name": "default", "node_id": "model.clickhouse.fct_token_data_agg_hour"} /

  delete from fact.fct_token_data_agg_hour where (token_id, toYYYYMMDDhhmmss(toStartOfHour(dttm))) in (select token_id, toYYYYMMDDhhmmss(toStartOfHour(dttm))
                                      from fact.fct_token_data_agg_hour__dbt_new_data)

        and toYYYYMMDD(toStartOfMonth(dttm)) in {{ get_incremental_period(var("incremental_period"),var("incremental_shift"),"month") }}
    ;

17:10:42.592055 [debug] [Thread-1 ]: Timing info for model.clickhouse.fct_token_data_agg_hour (execute): 2023-02-28 17:10:40.285091 => 2023-02-28 17:10:42.591825 17:10:42.603606 [debug] [Thread-1 ]: Database Error in model fct_token_data_agg_hour (models/fact/token/fct_token_data_agg_hour.sql) Code: 62. DB::Exception: Syntax error: failed at position 479 ('{') (line 6, col 54): { get_incremental_period(var("incremental_period"),var("incremental_shift"),"month") }} ; . Expected substitution name (identifier). Stack trace:

Configuration

{{ config( enabled = true, schema = 'fact', tags = ["history_hour", "token"], materialized = "incremental", unique_key = ['token_id','toYYYYMMDDhhmmss(toStartOfHour(dttm))'], incremental_strategy='delete+insert', incremental_predicates=[ "toYYYYMMDD(toStartOfMonth(dttm)) in {{ get_incremental_period(var(\"incremental_period\"),var(\"incremental_shift\"),\"month\") }}" ], order_by =("token_id", "toYYYYMMDDhhmmss(toStartOfHour(dttm))"), engine = "MergeTree()", partition_by = "toYYYYMMDD(toStartOfMonth(dttm))", ) }}

Environment

ClickHouse server

genzgd commented 1 year ago

That is standard behavior for dbt parameters. You could call additional macros to initialize your parameter following the workaround described here, https://stackoverflow.com/questions/73363627/pass-a-macro-as-a-parameter-jinja-dbt, but that's a pretty ugly solution.

For your use case I think you should probably just add the additional predicate to the is_incremental() clause in your model directly instead of trying to use the incremental_predicates function.

ikeniborn commented 1 year ago

is_incremental() get data from dbt_new_data and when delete from target table filterinf from all table from full scan. If i used incremental_predicates i filtering target table from partition. It is more optimize.

with out incremental_predicates delete from fact.fct_token_data_agg_hour where (token_id, toYYYYMMDDhhmmss(toStartOfHour(dttm))) in (select token_id, toYYYYMMDDhhmmss(toStartOfHour(dttm)) from fact.fct_token_data_agg_hour__dbt_new_data)

with incremental_predicates delete from fact.fct_token_data_agg_hour where (token_id, toYYYYMMDDhhmmss(toStartOfHour(dttm))) in ( select token_id, YYYYMMDDhhmmss(toStartOfHour(dttm)) from fact.fct_token_data_agg_hour__dbt_new_data) and toYYYYMMDD(toStartOfMonth(dttm)) = toYYYYMMDD(toStartOfMonth(today())) --> this filter from fct_token_data_agg_hour

genzgd commented 1 year ago

Okay, I understand -- can you use a CTE in the query and have the predicate reference the CTE alias?

genzgd commented 1 year ago

No, that probably won't work either because the DELETE does have any access to the CTE. This could probably be implemented in the macro to evaluate the incremental_predicate expression before using it in the DELETE FROM. Reopening to investigate that.

ikeniborn commented 1 year ago

i test my select with other data period and if i used date in unique_key that aiget filter partition when delete from target table. But if exclude date from unique_key i want filter partition from incremental_predicate. example with incremental_predicates delete from fact.fct_token_data_agg_hour where (token_id) in ( select token_id) from fact.fct_token_data_agg_hour__dbt_new_data) and toYYYYMMDD(toStartOfMonth(dttm)) = toYYYYMMDD(toStartOfMonth(today())) --> this filter from fct_token_data_agg_hour

simpl1g commented 1 year ago

@ikeniborn isn't it possible to pass all variables in config without wrapping in {{}}? because you are already in jinja context. As far as understand get_incremental_period is your custom macros? I can't test it. But I guess this code should work

    incremental_strategy='delete+insert',
    incremental_predicates=[
      ["toYYYYMMDD(toStartOfMonth(dttm)) in ", get_incremental_period(var("incremental_period"),var("incremental_shift"),"month")]|join("")
    ],
    order_by =("token_id", "toYYYYMMDDhhmmss(toStartOfHour(dttm))"),
simpl1g commented 1 year ago

I also recommend reading this https://github.com/ClickHouse/ClickHouse/issues/39870 before using "lightweight" deletes

genzgd commented 1 year ago

There have been some important updates to lightweight deletes since that issue was filed (although they are still experimental). Performance should be better, but any feedback we get from dbt users would be appreciated.

ikeniborn commented 1 year ago

@ikeniborn isn't it possible to pass all variables in config without wrapping in {{}}? because you are already in jinja context. As far as understand get_incremental_period is your custom macros? I can't test it. But I guess this code should work

    incremental_strategy='delete+insert',
    incremental_predicates=[
      ["toYYYYMMDD(toStartOfMonth(dttm)) in ", get_incremental_period(var("incremental_period"),var("incremental_shift"),"month")]|join("")
    ],
    order_by =("token_id", "toYYYYMMDDhhmmss(toStartOfHour(dttm))"),

I test, It's working. Thanks.

ikeniborn commented 1 year ago

About perfomance. delete more slowly then drop partition. About 3-5 slowly. For day i used partition, but when i want update hour a used delete, because i cant used partition from hour. I get limit for partition.