dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
294 stars 119 forks source link

Snowflake manual refresh doesn't work without partitions #169

Closed mikkosulonen closed 7 months ago

mikkosulonen commented 1 year ago

Describe the bug

Snowflake manual refresh doesn't work without partitions

Steps to reproduce

  1. Define external tables with manual refresh without partitions.
  2. Run dbt run-operation stage_external_sources

A snippet with names redacted from one sources.yml

version: 2

sources:
  - name: source_system
    description: ""
    freshness:
      warn_after: {count: 24, period: hour}
      error_after: {count: 72, period: hour}
    loaded_at_field: transfer_date
    schema: ext_stg
    tables:
      - name: your_external_table
        description: ""
        meta:
          is_delta: false
        external:
          location: "@ext_stg.datalake_source_system/landing/your_external_table/"
          file_format: ext_stg.csv_semicolon_skipheader_quotes_escape_slash
        columns:
          - name: ...
          - ...

Expected results

06:31:39  1 of 21 START external source ext_stg.your_external_table
06:31:39  1 of 21 (1) alter external table your_db.ext_stg.your_external_table refresh;    ...  
06:31:41  1 of 21 (1) SUCCESS 0

Actual results

06:28:04  1 of 21 START external source ext_stg.your_external_table
06:28:05  1 of 21 SKIP

System information

Using snowflake connection "macro_stage_external_sources"
2022-10-05T06:28:04.744464Z: 06:28:04  On macro_stage_external_sources: /* {"app": "dbt", "dbt_version": "1.2.2", "profile_name": "user", "target_name": "dbt_cloud_ui", "connection_name": "macro_stage_external_sources"} */
packages:
  - package: dbt-labs/dbt_utils
    version: 0.9.1
  - package: dbt-labs/codegen
    version: 0.8.0
  - package: dbt-labs/dbt_external_tables
    version: 0.8.0
  - package: Snowflake-Labs/dbt_constraints
    version: 0.4.2
  - package: brooklyn-data/dbt_artifacts
    version: 2.1.0

Which database are you using dbt with?

The output of dbt --version:

dbt Cloud 1.2.2

The operating system you're using:

dbt Cloud

The output of python --version:

Additional context

Quick'n'dirty "fix" I tried and verified the issue:

https://github.com/dbt-labs/dbt-external-tables/blob/2a407bf4e99dbd3f479f4d8856312edcef01a675/macros/plugins/snowflake/refresh_external_table.sql#L9

Changing to: {% set manual_refresh = true %}

I'll get the expected results.

The use case we have is that we have an independent archival mechanism in our source datalake. We don't have any partitions, we just want to keep adding the currently available files to the external table metadata and delete the ones that have been archived.

mikkosulonen commented 1 year ago

Thinking out loud here...

https://github.com/dbt-labs/dbt-external-tables/blob/2a407bf4e99dbd3f479f4d8856312edcef01a675/macros/plugins/snowflake/refresh_external_table.sql#L6-L9

Why do we need an external table to have partitions for manual refresh?

With defaults, ie. not configuring either:

{% set manual_refresh = (partitions and not auto_refresh) %}

evaluates to

{% set manual_refresh = (none and not false) %}
{% set manual_refresh = (none and true) %}
{% set manual_refresh = none %}

Looking at the documentation at: https://docs.snowflake.com/en/user-guide/tables-external-intro.html#partitioned-external-tables and based on our use-case, I see no reason why dbt-external-tables is requiring both of them 🤷‍♂️

If my thinking is correct, a fix would be as simple as:

{% set manual_refresh = (partitions or not auto_refresh) %}

But that still leaves the question of why having partitions would require manual refresh. Manual partitions do require manual refreshing https://docs.snowflake.com/en/user-guide/tables-external-intro.html#partitions-added-manually, but automatic partitioning does not require manual refresh https://docs.snowflake.com/en/user-guide/tables-external-intro.html#partitions-added-automatically

github-actions[bot] commented 1 year ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] commented 7 months ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] commented 7 months ago

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.