dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
305 stars 122 forks source link

Failure to create External Tables across Multiple Databases within Snowflake #89

Closed radeleye closed 3 years ago

radeleye commented 3 years ago

Describe the bug

We are attempting to create external tables within multiple databases yet when we run the dbt run-operation stage_external_sources command, the package attempts to create all external tables from stages within the database specified in our profiles.yml file and errors out when we attempt to create external tables in other databases because the stages specified in the sources file do not exist in that specific database.

It seems the package assumes all stages are within in the database specified in our profiles.yml file.

It also completely ignores any configurations within our dbt_project.yml file which specifies the database to have the external tables created in.

We have attempted to use arguments and tags to specify which database to have the external tables created but those options do not seem to work.

Steps to reproduce

models/orchestration/folder_for_db_one/schema_1

sources: 
  - name: stage_table
    database: db_one
    schema: raw_schema_1
    loaded_at_field: src_file_name_date
    tables:
      - name: ext_table
        external:
          location: '@s3_location'
          file_format: >
            (type = csv field_delimiter=',' SKIP_HEADER = 1 FIE

models/orchestration/folder_for_db_two/schema_2

sources: 
  - name: stage_table
    database: db_two
    schema: raw_schema_2
    loaded_at_field: src_file_name_date
    tables:
      - name: ext_table
        external:
          location: '@s3_location'
          file_format: >
            (type = csv field_delimiter=',' SKIP_HEADER = 1 FIE
models:
  folder_for_db_one:
    +database: DB_ONE
  folder_for_db_two:
    +database: DB_TWO

database: DB_ONE

Expected results

Success messages for all external tables configured to be created in databases other than the one found in our profiles.yml file.

Actual results

the external tables under the db from our profiles.yml are built successfully:

11:02:38 + 1 of 53 START external source raw_schema_1.ext_table
11:02:39 + 1 of 53 (1) create or replace external table db_one.t...  
11:02:40 + 1 of 53 (1) SUCCESS 1

Once external tables found in the second db's configuration is reached, an error is thrown

Encountered an error while running operation: Database Error
  002003 (02000): SQL compilation error:
  Stage 'DB_TWO.SCHEMA_2.DB_ONE_SCHEMA_STAGE' does not exist or not authorized.

Screenshots and log output

System information

The contents of your packages.yml file:


  - package: fishtown-analytics/dbt_external_tables
    version: 0.6.2
# <see https://hub.getdbt.com/fishtown-analytics/dbt_external_tables/latest/> for more details 

Which database are you using dbt with?

The output of dbt --version:

Running with dbt=0.19.1

The operating system you're using:

The output of python --version:

Additional context

jtcohen6 commented 3 years ago

Thanks for the detailed writeup @radeleye!

It seems the package assumes all stages are within in the database specified in our profiles.yml file.

The package doesn't do anything with stages, except include the provided external.location in its create external table and create snowpipe DDL statements. As noted in the README, this package assumes that you have already created any Snowflake external stages you need. If those stages exist in databases different from target.database, then you can fully qualify the name of the stage in the location yaml property:

external:
  location: '@db_two.schema_name.stage_name'

It also completely ignores any configurations within our dbt_project.yml file which specifies the database to have the external tables created in.

This is tricky: the database source property is different from the database config of models, which is what you're configuring in dbt_project.yml. I get that this is confusing—resource property vs. node config—it's a distinction without a difference, really, and it's something we'd like to resolve ahead of releasing dbt v1.0 (https://github.com/fishtown-analytics/dbt/issues/2401). In the meantime, if you want to change the database of the external table the package will create, you need to do it within each source's properties.

I'm going to close this issue for now, as I don't think we need (or can make) a code change in the package to account for the points you've raised. If you think there's something specific that the package could be doing differently, however, I'm all ears.