dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
297 stars 119 forks source link

Snowflake External Tables - Ability to add in custom col_expression #140

Closed dalvarez06 closed 4 months ago

dalvarez06 commented 2 years ago

Describe the feature

dbt-external-tables does a great job of creating external tables into snowflake. However there are times where one might want to make there own expression to create a column.

Describe alternatives you've considered

If we look at the below code snippet we see that the only variable that has access to the expression is col_expression. The issue is that we have no control over what gets passed to the col_expression variable other than the name. The below code is from https://github.com/dbt-labs/dbt-external-tables/blob/main/macros/plugins/snowflake/create_external_table.sql

{%- for column in columns %}
    {%- set column_quoted = adapter.quote(column.name) if column.quote else column.name %}
    {%- set col_expression -%}
        {%- set col_id = 'value:c' ~ loop.index if is_csv else 'value:' ~ column_quoted -%}
        (case when is_null_value({{col_id}}) or lower({{col_id}}) = 'null' then null else {{col_id}} end)
    {%- endset %}
    {{column_quoted}} {{column.data_type}} as ({{col_expression}}::{{column.data_type}})
    {{- ',' if not loop.last -}}
{% endfor %}

Additional context

This feature would just be for Snowflake.

Who will this benefit?

Anyone who wants to do more complex expressions in their external tables.

Proposed fix

Here is a quick fix that I wrote. I'm sure there are plenty of ways of doing this. I tested this out and it worked.

columns:
    - name: user_id
      data_type: varchar
    - name: complex_column
      data_type: varchar
      col_expression: custom expression
{%- for column in columns %}
    {%- set column_quoted = adapter.quote(column.name) if column.quote else column.name %}
    {%- set col_expression -%}
        {%- if column.col_expression -%}
            {{column.col_expression}}
        {%- else -%}
            {%- set col_id = 'value:c' ~ loop.index if is_csv else 'value:' ~ column_quoted -%}
            (case when is_null_value({{col_id}}) or lower({{col_id}}) = 'null' then null else {{col_id}} end)
        {%- endif -%}
    {%- endset %}
    {{column_quoted}} {{column.data_type}} as ({{col_expression}}::{{column.data_type}})
    {{- ',' if not loop.last -}}
{% endfor %}
nic-sharesies commented 2 years ago

+1 on this! I'm about to fork dbt-external-tables because I want a custom column expression to pull the replicated_date out of our parquet files using the Snowpipe option.

nochimo commented 1 year ago

+1 This would help a lot some use cases we have. Any news on that?

adam-campbell-mfe commented 10 months ago

I would greatly support this feature, there has been some discussion on other threads that the external table importing should just do the ET and not T (which should be performed by models). I mostly agree with that idea, however the only way to get snowflake external table metadata into your dbt defined external table i.e. METADATA$FILENAME is via an expression in a partition. Which might not be the way I want to setup my partition for that table.

Another frustration is if want to store dates in a column, but the AUTO detect feature fails due to some non-standard date type I am forced to import the column as string and convert to date down the line. This could potentially lead to a range of issues.

At present the ability to create snowflake external tables using dbt is too limiting for my use.