ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
5 stars 3 forks source link

Refactor `row_values_match_after_join` and `res_class_matches_pardat` for join flexibility #360

Closed jeancochrane closed 3 months ago

jeancochrane commented 3 months ago

We would like to use the row_values_match_after_join and res_class_matches_pardat generics to implement https://github.com/ccao-data/ptaxsim/issues/31, but the generics as they exist now are too rigid to support the kinds of joins we need to do in those tests. In particular, neither test currently supports base models whose parid and taxyr columns have different names (e.g. pin or year), and row_values_match_after_join doesn't support left or outer joins.

This PR makes some tweaks to the two generics to support base models whose name schemes do not include parid and taxyr, and adds a join_type parameter to row_values_match_after_join. The changes are backwards-compatible in the case of res_class_matches_pardat, but breaking in the case of row_values_match_after_join; this is fine since the latter test is not yet in use anywhere in the codebase.

In the process of implementing these changes, I also beefed up the docs for these two generics (along with the format_additional_select_columns macro, which the generic docs reference) in order to make them easier to use by a dbt novice. I also implemented tests that use the new forms of these generics in order to test the changes, but I'm not including them here since I would like the implementer of https://github.com/ccao-data/ptaxsim/issues/31 to figure them out as an exercise.