dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
297 stars 119 forks source link

Include external tables in model #130

Closed brabster closed 2 years ago

brabster commented 2 years ago

Describe the feature

I'd like to configure an external table as part of my model

Describe alternatives you've considered

I've configured it as a source, which works OK - but when I'm doing development in my own schema I can't have my own version of the external table as well.

Additional context

I'm working with BigQuery at the moment. Not sure if I'm missing something about the thought process here, it seems projecting an external table into my model makes more sense in the models section of my schema.yml so that it just runs when I dbt run and it picks up the schema/dataset I'm working in.

Who will this benefit?

My client and I - unless you can explain why it doesn't make sense as a model, as I say maybe I'm just misthinking something.

aPeterHeise commented 2 years ago

Hi Brabster,

can you give more background & expectations please? The workflow in DBT is typically that you use sources for data that is loaded by 3rd party systems into your database; and models to reference such ONLY (via refs).

DBT external-tables could be seen as such a 3rd party tool.

So if you want external table support in your model, this would break the above logic.

jtcohen6 commented 2 years ago

Sorry for the delayed response! You're both right — external tables are a funny in-between. dbt can run the create external table statement for you, into a schema of your choice, which feels a lot like a model. At the same time, the thing that's being created is really just a naive pointer to data loaded by a 3rd party system; it doesn't (shouldn't IMO) contain any opinionated transformation/"business" logic, as a model does.

There's a lengthy conversation about this in the very first issue in the repository (#1); it's lengthy, but it will provide all the context behind the initial decision to treat external tables as sources (metadata + operation macro) instead of models (custom materialization + dbt run).

To be clear, the approach taken in this package is an opinionated one, and it doesn't in any way preempt you from taking a different tact. Many folks have created a custom materialization for external tables so that they will execute during dbt run.

As far as isolating external tables during development, you can use conditional expressions in your table definition like:

sources:
  - name: my_external_src
    schema: "{{ target.schema if target.name == 'dev' else 'my_external_src' }}"

I'm going to close this issue, since it's well-trod ground for this package.