dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
297 stars 119 forks source link

Support for database and schema set from yaml #157

Closed jdwalder closed 1 year ago

jdwalder commented 2 years ago

Describe the feature

In Snowflake, when database and schema are set in the yaml, the code currently correctly attempts to create the external table in that database and schema; however, the file_format "name" and the external stage have to be fully qualified. This defeats the ability to use env_var to store the database and schema based on environment like 'dev','qa','prod'. The feature request would be that the need for a fully qualified reference to the stage and file_format be eliminated by using the 'current_node' database and schema.

Describe alternatives you've considered

I have attempted using 'TYPE' instead of 'NAME' for the file_format since it initally would fail due to not having access/permission to a file_format that doesn't exist in the target.database and target.schema when it ran through the is_csv code. Doing that moved the issue to the external stage reference not existing when it executed the 'create or replace external table'. Again the reference to the external table was correct and fully qualified, but since it isn't "using" that DB.schema at the time it is executed the non-fully qualified external stage isn't found.

Additional context

To my knowledge this is specific to Snowflake.

Who will this benefit?

This would benefit anyone trying to use this package in a true multi-environment deployment (dev/test/prod).

samLozier commented 1 year ago

I also raised this issue in the past and was told that (as best I can remember) the package maintainers had strong opinions about the schema/stage information being managed outside of dbt. I personally found this problematic in my teams workflow and ended up having to create my own macros to create and manage those layers dynamically. I would also support what you've proposed.

I think it's a matter of perspective, but I'd viewed the use case for this package as equivalent "dbt seed" but for larger and dynamic datasets, so I came into it with assumptions that it would run in a similar manner. IMO it's fine that it doesn't, but it seems like the docs could be improved to call this out.

jdwalder commented 1 year ago

That is interesting in that it is certainly taking the database and schema of the 'current_node' for where the external table is being created. It makes zero sense that the stage and file format would be in a different database and schema than where the external table is being created.

If you are interested, I created a fork that solves the issue exactly as I requested. I just never submitted a pull request for it since none of the maintainers have responded.

github-actions[bot] commented 1 year ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] commented 1 year ago

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.