dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
288 stars 116 forks source link

Snowflake external table inference doesn't work with CSV format #254

Closed lmastalerz closed 3 months ago

lmastalerz commented 5 months ago

Describe the bug

When creating Snowflake external table using inference I need to provide a file format. In order to read column names I need to set PARSE_HEADER = TRUE in the format (and presumably SKIP_HEADER = 0). The problem is that when I’m trying to query external table created with a format with PARSE_HEADER = TRUE it throws:

“Invalid file format “PARSE_HEADER” is only allowed for CSV INFER_SCHEMA and MATCH_BY_COLUMN_NAME”

If I set PARSE_HEADER = FALSE, column names are not being recognised.

Steps to reproduce

  1. Set
    infer_schema: true 
    file_format: <name of the format with PARSE_HEADER = TRUE>
  2. Execute
    dbt run-operation stage_external_sources
  3. Execute in Snowflake
    SELECT * FROM <name of external table>

Expected results

Both columns inferred and table can be queried.

Actual results

Invalid file format “PARSE_HEADER” is only allowed for CSV INFER_SCHEMA and MATCH_BY_COLUMN_NAME

Screenshots and log output

System information

packages:
  - package: dbt-labs/dbt_utils
    version: 1.1.1
  - package: dbt-labs/dbt_external_tables
    version: 0.8.7

Which database are you using dbt with?

The output of dbt --version:

Core:
  - installed: 1.5.1
  - latest:    1.7.8 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - snowflake: 1.5.1 - Update available!
  - postgres:  1.5.1 - Update available!

MacOS Python 3.10.9

Additional context

Potentially can be solved by following logic similar to this (one file format for inference, one for actual table). Perhaps two parameters are needed: inference_file_format and file_format

lmastalerz commented 5 months ago

Small update, regardless of what we do here, it will be still broken because of this issue: https://community.snowflake.com/s/article/Creating-an-external-table-using-INFER-SCHEMA-for-CSV-is-populating-data-only-in-the-value-column

dataders commented 3 months ago

@lmastalerz thanks for opening and updating @lmastalerz! I'm going to close this issue here as "won't do" as it looks like an issue on the Snowflake side