Closed samLozier closed 1 year ago
ok, so I think I figured this out and it has to do with some assumptions baked into this package that weren't initially apparent to me.
This package does not create or update the external stage schema, unlike the pattern that's followed in dbt_project.yml. This means that the schema needs to be created, permissions granted, and then the stage created all outside of the normal dbt workflow. I did this with a custom macro + the run-operation command. I also replicated the custom schema logic that we use in the rest of my project in this macro.
This would be fine, but is further complicated by the fact that sources.yml does not appear to parse custom macros. This made it impossible to use the same custom schema logic in sources.yml/external sources as I was using else where in the project.
IMO none of these issues are inherently a bug, but they create a situation where the design patters and mental models established in the rest of dbt don't apply to this package. I would personally like it if I could simply provide an s3 url + some credentials and this package would take care of setting up the schemas and stages for me, but there may be some good reasons why this functionality was not adopted.
Hey @samLozier likely this is because this is meant to be part of the "Sources" context (https://docs.getdbt.com/docs/building-a-dbt-project/using-sources).
Sources in dbt are meant to "already exist" and selected from. If you had a setup like:
# models/sources.yml
version: 2
sources:
- name: schema_foo # this is equivalent to schema if the schema property.
tables:
- name: table_foo
...
# ~/.dbt/profiles.yml
...
schema: analytics
...
-- models/bar.sql
select 1 from user_id
dbt doesn't actually create your schema schema_foo
for you - it already expects it to exist. It will however create schema analytics
that model bar
is going to be written to if it didn't exist.
In that sense, it is consistent with the wider dbt functionality: schemas of models that dbt proper materializes ARE created if they don't exist but schemas of sources are not created if they don't exist because dbt expects them to already exist (it is the prerogative of the end user to ensure "raw source" data exists).
Looks like this is the same-ish issue right here https://github.com/dbt-labs/dbt-external-tables/issues/100 though I think @samLozier is asking for something further than a simple create schema ...
but additionally, a create stage ...
too (which brings with it much more complexity https://docs.snowflake.com/en/sql-reference/sql/create-stage.html#id2).
I believe also, this package is more about creating external tables from stages more so that creating stages themselves.
Thanks @jeremyyeo that makes sense. I think my main issue was that I was viewing this as an extension of seeds rather than an extension of sources. I would appreciate an easy way to scale up to large static data sources that's more efficient than dbt seed, but can understand how that might be outside the scope of this package.
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.
Describe the bug
It seems to be impossible to override the default schema that's set in profiles.yml when attempting to create external sources. The usual generate_schema_name macro also does not seem to be working, so the +schema syntax in yml doesn't parse properly
Steps to reproduce
Modifying the sources.yml to use the default schema lets the project run successfully, but then the tables appear when I don't expect them.
Expected results
I want a way to specify a target schema for externally staged tables loaded using this package. Barring that, I want an understandable error that helps me figure out how to fix the issue.
Actual results
see above
Screenshots and log output
System information
The contents of your
packages.yml
file: packages:Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using: osx dev/ linux prod The output of
python --version
: 3.9.12