dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
294 stars 119 forks source link

Spectrum `STORED AS PARQUET` does not output expected DDL #194

Closed mattppal closed 1 month ago

mattppal commented 1 year ago

Describe the bug

When defining external tables in Redshift Spectrum stored as parquet, the expected DDL is not returned by dbt-external-tables, rendering the external table unreadable.

Steps to reproduce

Config:

version: 2
sources:
  - name: spectrum
    schema: spectrum
    loader: S3
    loaded_at_field: loaded_at
    tables:
      - name: abc
        external:
          location: ...
          stored_as: PARQUET

Expected results

SHOW EXTERNAL TABLE spectrum.abc

Should yield

CREATE EXTERNAL TABLE spectrum.abc (
    ...
)
PARTITIONED BY ( .. )
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'xyz';

Since this is what is output when I run:

CREATE EXTERNAL TABLE spectrum.abc (
    ...
)
PARTITIONED BY ( .. )
STORED AS PARQUET
LOCATION 'xyz';

Actual results

The above command returns:

CREATE EXTERNAL TABLE spectrum.abc (
    ...
)
PARTITIONED BY ( .. )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 'xzy';

System information

packages:
  - package: dbt-labs/codegen
    version: 0.9.0
  - package: dbt-labs/redshift
    version: 0.8.0
  - package: dbt-labs/dbt_utils
    version: 1.0.0
  - package: dbt-labs/metrics
    version: 1.4.1
  - package: dbt-labs/dbt_external_tables
    version: 0.8.3

Which database are you using dbt with?

The output of dbt --version:

Core:
  - installed: 1.4.5
  - latest:    1.4.5 - Up to date!

Plugins:
  - redshift: 1.4.0 - Up to date!
  - postgres: 1.4.5 - Up to date!

The operating system you're using: Python 3.9.0

Additional context

github-actions[bot] commented 7 months ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

mattppal commented 7 months ago

Bump

padbk commented 7 months ago
row_format: serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
file_format: parquet

The above works for me. No need for stored_as

github-actions[bot] commented 1 month ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] commented 1 month ago

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.