arnoN7 / dbt-incremental-stream

DBT Package reproducing dbt incremental materialization leveraging on Snowflake streams
MIT License
25 stars 4 forks source link

feature request: allow for sources to have different name/identifier #10

Closed MarthaScheffler closed 1 month ago

MarthaScheffler commented 1 month ago

At the moment, the stream package only supports incremental sources when the identifier of the source is not different from its name.

version: 2

sources:
  - name: my_source_name
    database: my_database  
    schema: my_database
    tables:
      - name: my_table
         identifier: identifier_for_my_table

The issue arises otherwise during replacement of the input_model, when the table_name points towards an identifier and not the actual table.

I tried playing a bit around with the graph variable in macro stream_source:

{%- macro stream_source(source_name, table_name) -%}
    {% if execute %}
    {% set source_identifier = graph.sources['source.'~project_name~'.'~source_name~'.'~table_name]['identifier'] %}
    {{incr_stream.stream_input(table_name, 'source', source_name=source_name, source_identifier=source_identifier)}}
    {% endif %}
{%- endmacro -%}

but had no luck when replacing some of the table_names in the stream_input macro with the source_identifier.

(PS: similar might be interesting for aliasing tables, but in my case I don't use table aliases, while I use source names a lot)

arnoN7 commented 1 month ago

Is it the same bug #7 you highlighted before?

MarthaScheffler commented 1 month ago

Is it the same bug #7 you highlighted before?

No. This one appears, when you are not using the actual tables as names in your dbt project, but rename them (identifier=table in database, name=name to be used in dbt when referencing). In my case, source tables are really long (because they contain a lot of source metadata) and I want to keep my dbt project slim. so instead of referencing ref('raw', 'POSTGRES_DEV_QARMAINSPECT_PUBLIC_ACCOUNTS_127337958') I would only have to write ref('raw', 'accounts'), and dbt points to the correct table.

However, the code in the package code tries to replace the reference name ('accounts') in the code after compilation (i.e. raw.POSTGRES_DEV_QARMAINSPECT_PUBLIC_ACCOUNTS_127337958) - where it isn't present. it should instead try to replace the identifier.

the bit of code I cited (setting source_identifier) will pull the correct identifier, when only the name is given. However, I didn't manage to replace things it correctly, as something else was off as well.

arnoN7 commented 1 month ago

Normally it works can you test it ?

MarthaScheffler commented 1 month ago

Thank you, this works!