`compare_all_columns` has a tough time with null values

leoebfolsom commented 1 year ago

Describe the bug

Currently, the compare_all_columns macro does a poor job of distinguishing between missing data and null values.

We must check to see if the primary key exists in both of the two tables to determine whether the value in that table is null or missing.

Steps to reproduce

Incorrect handling of missing rows leading to wrong perfect_match value

PK 12345 exists in Table A but not in Table B
first_name for that row in Table A is NULL

Incorrect comparison of null to not null leading to wrong conflicting_valuesvalue

Compare any not-null to null value and the conflicting_values value was false when it should be true Note: Should it really be a conflicting_value if one of the columns is not null and the other is null? [Debate about what a null value really is redacted.] Yes! Because it's not a perfect match.

Expected results

Incorrect handling of missing rows leading to wrong perfect_match value

This should instead be categorized as perfect_match = false

Incorrect comparison of null to not null leading to wrong conflicting_valuesvalue

This should instead be categorized as conflicting_values = true

Actual results

The opposite of what's expected.

Screenshots and log output

System information

The contents of your packages.yml file:

packages:
  - package: dbt-labs/audit_helper
    version: 0.7.0

Which database are you using dbt with?

[ ] postgres
[ ] redshift
[ ] bigquery
[x] snowflake
[ ] other (specify: ____)

The output of dbt --version:

Core:
  - installed: 1.4.1
  - latest:    1.4.5 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - snowflake: 1.4.1 - Update available!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

The operating system you're using:

The output of python --version:

Python 3.9.6

Additional context

Are you interested in contributing the fix?

Yes!

leoebfolsom commented 1 year ago

I'm not sure if there's a more official way of connecting this issue to the PR I opened, but here it is! [IN PROGRESS] https://github.com/dbt-labs/dbt-audit-helper/pull/67

leoebfolsom commented 1 year ago

Fixed by:

dbt-labs / dbt-audit-helper