Closed codigo-ergo-sum closed 3 years ago
@codigo-ergo-sum Thanks for the detailed write-up. I've thought about this a bit more, and I've come to feel that we're really at the mercy of Snowflake keywords here. Since you are loading a CSV file, and you're matching columns by ordinal position rather than name (as in JSON/parquet/etc), your first and best choice is to pick a different column name :)
But we can still find a way to make this work. The create external table
statement succeeds if we quote the keyword:
create or replace external table RAW_DEV.our_source.employee_start_card(
FID_ID VARCHAR as (nullif(value:c1,'')::VARCHAR),
CLIENT_ID VARCHAR as (nullif(value:c2,'')::VARCHAR),
"UNION" VARCHAR as (nullif(value:c3,'')::VARCHAR)
)
location = @raw_dev.our_source.employee_stg
auto_refresh = True
file_format = raw_dev.our_source.pipe_delimited_csv_fmt
That puts us at the mercy of Snowflake's casing + quoting behavior. You'd always need to query it as:
select "UNION" from raw_dev.our_source.employee_start_card
This, by contrast, would raise an error:
select "union" from raw_dev.our_source.employee_start_card
That's something we'll just have to live with, I think. This is why we don't quote identifiers in Snowflake unless we absolutely must. (Vs. Redshift, where we can just quote everything.)
As far as defining that quoting behavior in the dbt .yml
file, I think you could manage this with whichever of the following you mind the least:
- name: '"UNION"'
data_type: VARCHAR
- name: "\"UNION"\"
data_type: VARCHAR
- name: UNION
data_type: VARCHAR
quote: true
Setting this property is what we recommend to get schema tests working with a quoted column (docs). I can open a PR to make the external tables package take account of quote: true
, too.
Describe the bug
We have incoming data with various column names that we want to stage using dbt-external-tables. One of the column names in the incoming data is "UNION". It is union because the business meaning is "the name of the labor union to which the employee belongs." When we create a .yml file referencing this column and run
dbt run-operation stage_external_sources --vars 'ext_full_refresh: true'
then we get this error message:Example .yml file (truncated to remove other sensitive names):
External table creation statement generated from this (that errors out) - edited to again remove sensitive columns:
We know that it is an issue with the keyword
UNION
because if we changeUNION
toUNION_
in the .yml file, everything runs fine.We've also tried enclosing the column name in single quotes, double quotes, and backticks without luck.
Steps to reproduce
See as above
Expected results
Keywords like "UNION" would not interfere with external table configuration.
Actual results
Errors as above
Screenshots and log output
As above
System information
The contents of your
packages.yml
file:Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using: Mac OS X
The output of
python --version
: Python 2.7.16