CREATE OR REPLACE EXTERNAL TABLE DB.schema.external_table USING TEMPLATE ( SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*)) FROM TABLE( INFER_SCHEMA( LOCATION=>'@my_S3_Stage' , FILE_FORMAT=>'MY_PARQUET_File_Format' , IGNORE_CASE => TRUE --another optional parameter for the infer_schema attribute ) ) ) LOCATION=@my_S3_Stage FILE_FORMAT=MY_PARQUET_File_Format AUTO_REFRESH=true ;
I would anticipate that with this new attribute to true - you would then not have to define any Columns.
Describe alternatives you've considered
Defining the above SQL in a model for each source table - this can then be built as either an external or persisted table
Additional context
Specific to Snowflake - and certain file formats - at time of writing supported files formats are: [Apache Parquet, Apache Avro, ORC]
Who will this benefit?
This means simply schema changes (new colums) can be easily handled within DBT code based for external sourced data.
For example, schema changes can be automatically accessible to downstream models or automatically processed when using a *select from {{ ref('ext_table')}} in a down-stream incremental model, the schema changes are then handled by on_shcema_change: append_new_columns** model config.
Describe the feature
I would like to be able to define an external table (via a source.yml) with an infer_schema attribute, default to false - but when set to true create SQL in the following format - as documented for Snowflake: https://docs.snowflake.com/en/sql-reference/functions/infer_schema#examples
CREATE OR REPLACE EXTERNAL TABLE DB.schema.external_table USING TEMPLATE ( SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*)) FROM TABLE( INFER_SCHEMA( LOCATION=>'@my_S3_Stage' , FILE_FORMAT=>'MY_PARQUET_File_Format' , IGNORE_CASE => TRUE --another optional parameter for the infer_schema attribute ) ) ) LOCATION=@my_S3_Stage FILE_FORMAT=MY_PARQUET_File_Format AUTO_REFRESH=true ;
I would anticipate that with this new attribute to true - you would then not have to define any Columns.
Describe alternatives you've considered
Defining the above SQL in a model for each source table - this can then be built as either an external or persisted table
Additional context
Specific to Snowflake - and certain file formats - at time of writing supported files formats are: [Apache Parquet, Apache Avro, ORC]
Who will this benefit?
This means simply schema changes (new colums) can be easily handled within DBT code based for external sourced data. For example, schema changes can be automatically accessible to downstream models or automatically processed when using a *select from {{ ref('ext_table')}} in a down-stream incremental model, the schema changes are then handled by on_shcema_change: append_new_columns** model config.