Currently, the compare_all_columns macro does a poor job of distinguishing between missing data and null values.
We must check to see if the primary key exists in both of the two tables to determine whether the value in that table is null or missing.
With this update, the code checks that a primary key exists in both tables before declaring a perfect_match. This was already accounted for in other logical checks (e.g., null_in_a, null_in_b).
There are also errors in the conflicting_values logic leading to incorrectly counting conflicting_values as false when comparing a null to a not null value when the PKs are present in both tables.
Checklist
[x] I have verified that these changes work locally
[x] I have updated the README.md (if applicable)
[x] I have added tests & descriptions to my models (and macros if applicable)
Description & motivation
Currently, the
compare_all_columns
macro does a poor job of distinguishing between missing data and null values.We must check to see if the primary key exists in both of the two tables to determine whether the value in that table is null or missing.
With this update, the code checks that a primary key exists in both tables before declaring a
perfect_match
. This was already accounted for in other logical checks (e.g.,null_in_a
,null_in_b
).There are also errors in the
conflicting_values
logic leading to incorrectly countingconflicting_values
as false when comparing anull
to anot null
value when the PKs are present in both tables.Checklist