Currently, the compare_all_columns macro does a poor job of distinguishing between missing data and null values.
We must check to see if the primary key exists in both of the two tables to determine whether the value in that table is null or missing.
Steps to reproduce
Incorrect handling of missing rows leading to wrong perfect_match value
PK 12345 exists in Table A but not in Table B
first_name for that row in Table A is NULL
Incorrect comparison of null to not null leading to wrong conflicting_valuesvalue
Compare any not-null to null value and the conflicting_values value was false when it should be true
Note: Should it really be a conflicting_value if one of the columns is not null and the other is null? [Debate about what a null value really is redacted.] Yes! Because it's not a perfect match.
Expected results
Incorrect handling of missing rows leading to wrong perfect_match value
This should instead be categorized as perfect_match = false
Incorrect comparison of null to not null leading to wrong conflicting_valuesvalue
This should instead be categorized as conflicting_values = true
Core:
- installed: 1.4.1
- latest: 1.4.5 - Update available!
Your version of dbt-core is out of date!
You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation
Plugins:
- snowflake: 1.4.1 - Update available!
At least one plugin is out of date or incompatible with dbt-core.
You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation
Describe the bug
Currently, the
compare_all_columns
macro does a poor job of distinguishing between missing data and null values.We must check to see if the primary key exists in both of the two tables to determine whether the value in that table is null or missing.
Steps to reproduce
Incorrect handling of missing rows leading to wrong
perfect_match
valuefirst_name
for that row in Table A isNULL
Incorrect comparison of
null
tonot null
leading to wrongconflicting_values
valueconflicting_values
value wasfalse
when it should betrue
Note: Should it really be aconflicting_value
if one of the columns is not null and the other is null? [Debate about what a null value really is redacted.] Yes! Because it's not a perfect match.Expected results
Incorrect handling of missing rows leading to wrong
perfect_match
valueperfect_match = false
Incorrect comparison of
null
tonot null
leading to wrongconflicting_values
valueconflicting_values = true
Actual results
The opposite of what's expected.
Screenshots and log output
System information
The contents of your
packages.yml
file:Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using:
The output of
python --version
:Additional context
Are you interested in contributing the fix?
Yes!