dbt v0.15.0 added support for an external
property within sources
that can include information about location
, partitions
, and other database-specific properties.
This package provides:
.yml
file source definitionsFollow the instructions at hub.getdbt.com on how to modify your packages.yml
and run dbt deps
.
The stage_external_sources
macro is the primary point of entry when using this package. It has two operational modes: standard and "full refresh."
# iterate through all source nodes, create if missing, refresh metadata
$ dbt run-operation stage_external_sources
# iterate through all source nodes, create or replace (+ refresh if necessary)
$ dbt run-operation stage_external_sources --vars "ext_full_refresh: true"
The stage_external_sources
macro accepts a limited node selection syntax similar to
snapshotting source freshness:
# stage all Snowplow and Logs external sources:
$ dbt run-operation stage_external_sources --args "select: snowplow logs"
# stage a particular external source table:
$ dbt run-operation stage_external_sources --args "select: snowplow.event"
The macros assume that you:
version: 2
sources:
- name: snowplow
tables:
- name: event
description: >
This source table is actually a set of files in external storage.
The dbt-external-tables package provides handy macros for getting
those files queryable, just in time for modeling.
external:
location: # required: S3 file path, GCS file path, Snowflake stage, Synapse data source
... # database-specific properties of external table
partitions: # optional
- name: collector_date
data_type: date
... # database-specific properties
# Specify ALL column names + datatypes.
# Column order must match for CSVs, column names must match for other formats.
# Some databases support schema inference.
columns:
- name: app_id
data_type: varchar(255)
description: "Application ID"
- name: platform
data_type: varchar(255)
description: "Platform"
...
The stage_external_sources
macro will use this YAML config to compile and
execute the appropriate create
, refresh
, and/or drop
commands:
19:01:48 + 1 of 1 START external source spectrum.my_partitioned_tbl
19:01:48 + 1 of 1 (1) drop table if exists "db"."spectrum"."my_partitioned_tbl"
19:01:48 + 1 of 1 (1) DROP TABLE
19:01:48 + 1 of 1 (2) create external table "db"."spectrum"."my_partitioned_tbl"...
19:01:48 + 1 of 1 (2) CREATE EXTERNAL TABLE
19:01:48 + 1 of 1 (3) alter table "db"."spectrum"."my_partitioned_tbl"...
19:01:49 + 1 of 1 (3) ALTER EXTERNAL TABLE
sample_sources
: detailed example source specs, with annotations, for each database's implementationsample_analysis
: a "dry run" version of the compiled DDL/DML that
stage_external_sources
runs as an operationtested specs
: source spec variations that are confirmed to work on each database, via integration testsIf you encounter issues using this package or have questions, please check the open issues, as there's a chance it's a known limitation or work in progress. If not, you can:
Additional contributions to this package are very welcome! Please create issues or open PRs against master
. Check out this post on the best workflow for contributing to a package.