Closed adammarples closed 1 year ago
Hi, thanks for the PR! I think I understand the intent here (save time on typing a schema definition and tests), but I think the current dbt-ish way to accomplish this is to define each column in a schema.yml file and then add the appropriate type tests, which we already have with expect_column_values_to_be_of_type
and expect_column_values_to_be_in_type_list
.
If we were to go down this route, I would prefer not to use a Python dict, but use YAML to define structure since that's the commonly used approach in this repo and other dbt package repos.
Hi @clausherther, thanks for the reply. I have modified the mapping to be pure YAML, like this.
sources:
- name: my_source
tables:
- name: my_table
tests:
- dbt_expectations.expect_table_column_names_and_dtypes:
mapping:
name: VARCHAR
birthday: DATE
age_in_years: INTEGER
Appreciate what you are saying about the dbt way of doing things. The intent here is really to try and provide a full check on changing source schemas without the yaml getting too verbose. It's not so much to save time but to save long yaml blocks and mental overhead in reading and maintaining them. The current way seems to be a combination of the following:
sources:
- name: my_source
tables:
- name: my_table
tests:
- dbt_expectations.expect_table_column_count_to_equal:
value: 3
- dbt_expectations.expect_table_columns_to_match_set:
column_list: ["name", "birthday", "age_in_years"]
columns:
- name: name
tests:
- dbt_expectations.expect_column_to_exist
- dbt_expectations.expect_column_values_to_be_of_type:
column_type: varchar
- name: birthday
tests:
- dbt_expectations.expect_column_to_exist
- dbt_expectations.expect_column_values_to_be_of_type:
column_type: date
- name: age_in_years
tests:
- dbt_expectations.expect_column_to_exist
- dbt_expectations.expect_column_values_to_be_of_type:
column_type: integer
Completely up to you, I will leave this here
This PR creates a schema test which will directly compare the
column name : data type
mapping of the relation with a given python dictionary.After comparing upper cased column names and data types, it will run an EXCEPT/UNION difference type query which will return any differences found between the two mappings and returns rows containing the difference and a
found_in
column explaining the errorThis is intended to be a complete coverage for schema changes in place more verbose workarounds like this https://madisonmae.substack.com/p/tutorial-using-dbt-to-test-for-schema