dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
297 stars 119 forks source link

Snowflake stage_external_sources fails if target in sources.yml doesn't match default target in profiles.yml #142

Closed samLozier closed 1 year ago

samLozier commented 2 years ago

Describe the bug

It seems to be impossible to override the default schema that's set in profiles.yml when attempting to create external sources. The usual generate_schema_name macro also does not seem to be working, so the +schema syntax in yml doesn't parse properly

Steps to reproduce

#sources.yml

version: 2
sources:
- name: STATIC_DATA
  database: ANALYTICS_DEV_MART
  schema: "{{ target.schema }}_EXTERNAL" # modifying the default schema
  loader: S3
  loaded_at_field: RUN_DATE
  tables:
#profiles.yml
...
schema: EXAMPLE_PROJECT
...
23:14:55  1 of 4 (1) create or replace external table ANALYTICS_DEV_MART.EXAMPLE_PROJECT_EXTERNAL.EXAMPLETABLE...  
23:14:55  Encountered an error while running operation: Database Error
  090106 (22000): Cannot perform STAGE GET. This session does not have a current schema. Call 'USE SCHEMA', or use a qualified name.

Modifying the sources.yml to use the default schema lets the project run successfully, but then the tables appear when I don't expect them.

version: 2
sources:
- name: STATIC_DATA
  database: ANALYTICS_DEV_MART
  schema: "{{ target.schema }}" # using default schema
  loader: S3
  loaded_at_field: RUN_DATE
  tables:

Expected results

I want a way to specify a target schema for externally staged tables loaded using this package. Barring that, I want an understandable error that helps me figure out how to fix the issue.

Actual results

see above

Screenshots and log output

System information

The contents of your packages.yml file: packages:

Which database are you using dbt with?

The output of dbt --version:

1.0.7

The operating system you're using: osx dev/ linux prod The output of python --version: 3.9.12

samLozier commented 2 years ago

ok, so I think I figured this out and it has to do with some assumptions baked into this package that weren't initially apparent to me.

This package does not create or update the external stage schema, unlike the pattern that's followed in dbt_project.yml. This means that the schema needs to be created, permissions granted, and then the stage created all outside of the normal dbt workflow. I did this with a custom macro + the run-operation command. I also replicated the custom schema logic that we use in the rest of my project in this macro.

This would be fine, but is further complicated by the fact that sources.yml does not appear to parse custom macros. This made it impossible to use the same custom schema logic in sources.yml/external sources as I was using else where in the project.

IMO none of these issues are inherently a bug, but they create a situation where the design patters and mental models established in the rest of dbt don't apply to this package. I would personally like it if I could simply provide an s3 url + some credentials and this package would take care of setting up the schemas and stages for me, but there may be some good reasons why this functionality was not adopted.

jeremyyeo commented 2 years ago

Hey @samLozier likely this is because this is meant to be part of the "Sources" context (https://docs.getdbt.com/docs/building-a-dbt-project/using-sources).

Sources in dbt are meant to "already exist" and selected from. If you had a setup like:

# models/sources.yml
version: 2
sources:
  - name: schema_foo   # this is equivalent to schema if the schema property.
    tables:
      - name: table_foo
...
# ~/.dbt/profiles.yml
...
  schema: analytics
...
-- models/bar.sql
select 1 from user_id

dbt doesn't actually create your schema schema_foo for you - it already expects it to exist. It will however create schema analytics that model bar is going to be written to if it didn't exist.

In that sense, it is consistent with the wider dbt functionality: schemas of models that dbt proper materializes ARE created if they don't exist but schemas of sources are not created if they don't exist because dbt expects them to already exist (it is the prerogative of the end user to ensure "raw source" data exists).

Looks like this is the same-ish issue right here https://github.com/dbt-labs/dbt-external-tables/issues/100 though I think @samLozier is asking for something further than a simple create schema ... but additionally, a create stage ... too (which brings with it much more complexity https://docs.snowflake.com/en/sql-reference/sql/create-stage.html#id2).

I believe also, this package is more about creating external tables from stages more so that creating stages themselves.

samLozier commented 2 years ago

Thanks @jeremyyeo that makes sense. I think my main issue was that I was viewing this as an extension of seeds rather than an extension of sources. I would appreciate an easy way to scale up to large static data sources that's more efficient than dbt seed, but can understand how that might be outside the scope of this package.

github-actions[bot] commented 1 year ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] commented 1 year ago

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.