dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
297 stars 119 forks source link

Too many quotes in generated command #127

Closed SteadyGiant closed 1 year ago

SteadyGiant commented 2 years ago

Edit: Perhaps I should instead submit this issue to dbt's Redshift plugin, since dropif.sql seems to generate DROP TABLE commands and there's no way to control quote marks there. Or perhaps it belongs in dbt-core instead.

Describe the bug

When I run dbt run-operation stage_external_sources, I get the error message:

Encountered an error while running operation: Database Error cross-database reference to database "sources" is not supported

because dbt tries running this command:

drop table if exists "sources"."src_pendo"."src_accounts" cascade

when it should be:

drop table if exists "sources.src_pendo.src_accounts" cascade

And, indeed, when I run the latter SQL command in Redshift Query Editor on my warehouse there's no error, whereas when I run the former SQL command I get the same error.

Steps to reproduce

I'm running all CLI commands in PowerShell.

My models/staging/pendo/src_pendo.yml file (following this example) looks like this (one column for brevity):

version: 2

sources:
  - name: s3_pendo
    database: sources
    schema: src_pendo
    loader: S3
    loaded_at_field: _sdc_batched_at
    tables:
      - name: src_accounts
        external:
          location: <s3 path>
          row_format: serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
          table_properties: "('skip.header.line.count'='1')"
          stored_as: parquet
        columns:
          - name: account_id
            data_type: varchar
            tests:
              - not_null
              - unique

<s3 path> contains CSV files.

Expected results

If I'm understanding correctly, the dbt run-operation stage_external_sources macro will create the external schema and table defined in src_pendo.yml in Redshift Spectrum, then populate the table with all data in the CSV files in <s3 path>. Let me know if this isn't the use case.

Actual results

See bug description.

Screenshots and log output

Full (PowerShell) output after running the macro:

image

System information

The contents of your packages.yml file:

packages:
  - package: dbt-labs/dbt_external_tables
    version: 0.8.0

Which database are you using dbt with?

The output of dbt --version:

installed version: 1.0.1
   latest version: 1.0.1

Up to date!

Plugins:
  - postgres: 1.0.1
  - redshift: 1.0.0

The operating system you're using: Windows 10

The output of python --version: Python 3.9.6

Additional context

If this is truly a bug then I'll try submitting a PR.

jtcohen6 commented 2 years ago

@everettttt Sorry for the delayed response here! It sounds like there are a few pieces here, so I want to retrace our steps.

You're seeing this error:

cross-database reference to database "sources" is not supported"

Because dbt is operating on a table that it believes to exist at "sources"."src_pendo"."src_accounts"—that is, in a database named "sources", which is a different name from the database you're connecting to in Redshift. This is because you've defined the source with:

version: 2

sources:
  - name: s3_pendo
    database: sources  # this is a different database name!
    schema: src_pendo

Are you using the RA3 node type with Redshift? If so, you can query data that lives in other databases, but I believe you still won't be able to create objects (including external tables) in those other databases.

github-actions[bot] commented 1 year ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] commented 1 year ago

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.