dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
297 stars 119 forks source link

Corrupt BigQuery DDL statement for external tables #144

Closed achrusciel closed 2 years ago

achrusciel commented 2 years ago

Describe the bug

I have a bucket in region us-west1 and a subfolder 02_ynab. There I store partitioned parquet data in the format 'server_knowledge=15547' etc. I want to access this parquet data through an external table in BigQuery. The generated DDL statement is not correct:

create or replace external table `ynab-dwh-dev-01`.`stage`.`ynab_report_test_03`

        with partition columns
        options (
            uris = ['gs://ynab-dwh-dev-bucket-01/02_ynab/*'], format = 'parquet', hive_partition_uri_prefix = 'gs://ynab-dwh-dev-bucket-01/02_ynab/', partitions = [{'name': 'server_knowledge', 'data_type': 'int64'}])

Steps to reproduce

Create a file bigquery_stage.yml with the following contents

version: 2

sources:
  - name: stage
    database: ynab-dwh-dev-01
    loader: gcloud storage
    tables:
      - name: ynab_report_test_03
        external:
          location: 'gs://ynab-dwh-dev-bucket-01/02_ynab/*'
          options:
            format: parquet
            hive_partition_uri_prefix: 'gs://ynab-dwh-dev-bucket-01/02_ynab/'
            partitions:               
            - name: server_knowledge
              data_type: int64

Expected results

This SQL-DDL statement works in the BigQuery SQL console:

create or replace external table `ynab-dwh-dev-01`.`ynab_dwh_dev_dataset_01`.`ynab_report_test_03`
        with partition columns (server_knowledge INT64)
        options (
            uris = ['gs://ynab-dwh-dev-bucket-01/02_ynab/*'], 
            format = 'parquet', 
            hive_partition_uri_prefix = 'gs://ynab-dwh-dev-bucket-01/02_ynab/')

Actual results

14:33:51.317742 [info ] [MainThread]: 1 of 1 START external source stage.ynab_report_test_03
14:33:51.324393 [debug] [MainThread]: On "macro_stage_external_sources": cache miss for schema "{self.database}.{self.schema}", this is inefficient
14:33:51.324668 [debug] [MainThread]: Opening a new connection, currently in state init
14:33:51.726519 [info ] [MainThread]: 1 of 1 (1) create or replace external table `ynab-dwh-dev-01`.`stage`.`ynab_report_test_03`...  
14:33:51.733982 [debug] [MainThread]: On macro_stage_external_sources: /* {"app": "dbt", "dbt_version": "1.1.1", "profile_name": "dbt_bigquery_dwh", "target_name": "dev", "connection_name": "macro_stage_external_sources"} */

    create or replace external table `ynab-dwh-dev-01`.`stage`.`ynab_report_test_03`

        with partition columns
        options (
            uris = ['gs://ynab-dwh-dev-bucket-01/02_ynab/*'], format = 'parquet', hive_partition_uri_prefix = 'gs://ynab-dwh-dev-bucket-01/02_ynab/', partitions = [{'name': 'server_knowledge', 'data_type': 'int64'}])

14:33:52.038909 [debug] [MainThread]: BigQuery adapter: Retry attempt 1 of 1 after error: BadRequest('Braced constructors are not supported at [8:165]')

Screenshots and log output

System information

The contents of your packages.yml file:

packages:
  - package: Datavault-UK/dbtvault
    version: 0.8.3
  - package: dbt-labs/dbt_external_tables
    version: 0.8.0

Which database are you using dbt with?

The output of dbt --version:

Core:
  - installed: 1.1.1
  - latest:    1.1.1 - Up to date!

Plugins:
  - bigquery: 1.1.0 - Up to date!

The operating system you're using: Docker Ubuntu

FROM gitpod/workspace-python-3.10

RUN sudo apt-get update && sudo apt-get install default-jre -y

COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

The output of python --version: Python 3.10.5

Additional context

requirements.txt

pyspark==3.3.0
dbt-core==1.1.1
dbt-bigquery==1.1.0
achrusciel commented 2 years ago

My fault: the yml-file was misconfigured. This yml-file should look like this:

version: 2

sources:
  - name: stage
    database: ynab-dwh-dev-01
    tables:
      - name: ynab_report_test_03
        external:
          location: 'gs://ynab-dwh-dev-bucket-01/02_ynab/*'
          options:
            format: parquet
            hive_partition_uri_prefix: 'gs://ynab-dwh-dev-bucket-01/02_ynab/'
          partitions:               
          - name: server_knowledge
            data_type: int64

--> partitions and options shall be on the same level.