dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
299 stars 120 forks source link

Add metadata_file_last_modified for snowpiped tables #239

Closed Catisyf closed 5 months ago

Catisyf commented 11 months ago

Description & motivation

resolves: #276

dbt currently supports the following metadata columns for snowpiped table.

    #   `metadata_filename`: the file from which this row was loaded
    #   `metadata_file_row_number`: the numbered row this was in that file
    #   `_dbt_copied_at`: the current_timestamp when this row was loaded (backfilled or piped)

This PR adds a new metadata column metadata_file_last_modified (metadata$file_last_modified), which returns the last modified timestamp of the staged file the current row belongs to. This metadata is useful for a use case where we'd like to filter staged files from an S3 bucket after a certain cutoff date, and we can't easily infer such date from the file name.

This is a simple modification and I have tested the changes in our dbt project. It works as expected.

Checklist

jtmcn commented 8 months ago

Please do add this functionality. Although I'd suggest replacing the current _dbt_copied_at field which uses current_timestamp since Snowflake recommends against using that function in a Snowpipe.

https://community.snowflake.com/s/article/Incorrect-timestamp-in-Snowpipe-while-using-current-timestampfunction

https://docs.snowflake.com/en/user-guide/data-load-snowpipe-ts#load-times-inserted-using-current-timestamp-earlier-than-load-time-values-in-copy-history-view

dataders commented 5 months ago

@jtmcn see #281 for your suggestion. I'd welcome PR for this!