getindata / dbt-flink-adapter

Adapter for dbt that executes dbt pipelines on Apache Flink
Apache License 2.0
83 stars 10 forks source link

SET 'execution.runtime-mode' runs on every statement #62

Open zqWu opened 1 year ago

zqWu commented 1 year ago

for example

{{ config( 
    materialized='table',
    pre_hook=[
        "SET 'execution.runtime-mode' = 'streaming';",
        "set 'execution.checkpointing.interval'='20s';",
        "set 'execution.checkpointing.mode'='AT_LEAST_ONCE';"
    ]
  ) 
}}

select * from {{ source('my_source', 'topic01') }}

this will be translated into

SET 'execution.runtime-mode' = 'batch'
SET 'execution.runtime-mode' = 'streaming';
SET 'execution.runtime-mode' = 'batch'
set 'execution.checkpointing.interval'='20s';
SET 'execution.runtime-mode' = 'batch'
set 'execution.checkpointing.mode'='AT_LEAST_ONCE';
SET 'execution.runtime-mode' = 'batch'
create  table xxxx_balabala

2 ways to handle this:

  1. when see "set xx=yy" , do not automatic add another " SET 'execution.runtime-mode' ='batch'"

    this will not help when many model, like model-1 is batch, and model-2 is streaming when run create mode1-1, it will set to batch then run create model-2, it will not guarante current mode is streaming

  2. simply leave mode to user control eg, in model's config, pre_hook = "set xx=xx" i prefer this solution, it leaves all things transparent, despite somehow inconvenient

gliter commented 1 year ago

Ad 1. That definitely be an improvment. It could be even made to set it only when executing the main statatmenet

I would add approach 3. to allow pipeline configuration to be defined in yaml as well.

zqWu commented 1 year ago

great, explicit yet convienient