Closed b-per closed 1 year ago
The test failure on Redshift doesn't seem to be related to this change.
Hey @b-per - thanks for this :)
Given the nature of the change to one of the "core" macros of this package, I think it would make sense to add some test to check for syntax?
We can probably do that by adding a few more sources to https://github.com/dbt-labs/dbt-external-tables/blob/main/integration_tests/models/plugins/snowflake/snowflake_external.yml
I've converted the partitioned csv files from s3://dbt-external-tables-testing/csv/
which are the following files:
s3://dbt-external-tables-testing/csv/section=a/people_a.csv
s3://dbt-external-tables-testing/csv/section=b/people_b.csv
s3://dbt-external-tables-testing/csv/section=c/people_c.csv
s3://dbt-external-tables-testing/csv/section=d/people_d.csv
Using:
from pyarrow import csv
from pyarrow import parquet
people = csv.read_csv("people_a.csv")
parquet.write_table(people, "people_a.parquet")
And have uploaded that to s3 so it's available at:
s3://dbt-external-tables-testing/parquet/section=a/people_a.parquet
s3://dbt-external-tables-testing/parquet/section=b/people_b.parquet
s3://dbt-external-tables-testing/parquet/section=c/people_c.parquet
s3://dbt-external-tables-testing/parquet/section=d/people_d.parquet
@jeremyyeo Done. Let me know what you think!
I fixed the tests for all adapters. Redshift is still failing but this is not related to my change.
Description & motivation
Fixes #139 and #191, allowing the package to infer columns for parquet files on Snowflake
This Snowflake page in the docs mention the following code to create an external table using the schema of a parquet file:
But in that case, I couldn't get DDL working when defining partitions. Partitions being a huge saver for external tables it didn't look ideal.
What I have done instead is calling
INFER_SCHEMA
in arun_query()
and then looping through the columns. The DDL executed is actually the same as if all the columns were maintained manually.Example of source
A new key
infer_schema
is introduced.We can still list
columns
for documentation purpose but ifinfer_schema = true
we don't use those in theDDL
.Checklist
There is currently no details about infering schemas in the README and not test with parquet data on Snowflake.