ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
5 stars 3 forks source link

Add stage-level classes to `default.vw_pin_history` #338

Closed wrridgeway closed 3 months ago

wrridgeway commented 3 months ago

default.vw_pin_history currently has one class column, but values for three stages. Since class can change between stages, this makes it impossible to know whether a stage's values should be excluded from a ratio study, if SOPs were to dictate such exclusions.

wrridgeway commented 3 months ago

Currently non-stage specific class from pardat is still in vw_pin_history. It's probably unnecessary since it only disagrees with board-stage class 3 times in all of our data. These instances are likely errors.

select
    pin, year, class, board_class
from "z_ci_337-add-stage-level-classes-to-defaultvw-pin-history_default".vw_pin_history
where class != board_class
pin year class board_class
18074080240000 2022 209 100
18044020300000 2022 278 100
15261090070000 2023 2 EX

Worth noting here that a class value of '2' is the product of our current class code cleaning not allowing 'OA2' into our views. This is a new class code that seems to have appeared in 2023.

wrridgeway commented 3 months ago

ARBITRARY seems to work just fine:

select count(*)
from "z_ci_337-add-stage-level-classes-to-defaultvw-pin-history_default".vw_pin_value new
left join default.vw_pin_value old
on new.pin = old.pin and new.year = old.year
where new.mailed_bldg != old.mailed_bldg
    OR new.mailed_land != old.mailed_land
    OR new.mailed_tot != old.mailed_tot
    OR new.certified_bldg != old.certified_bldg
    OR new.certified_land != old.certified_land
    OR new.certified_tot != old.certified_tot
    OR new.board_bldg != old.board_bldg
    OR new.board_land != old.board_land
    OR new.board_tot != old.board_tot

returns 0

wrridgeway commented 3 months ago

Let's see some dbt tests in here

I've added some for vw_pin_value. For vw_pin_history it's a little difficult since it's primarily a product of vw_pin_history. I spoke with @jeancochrane about implementing a test like:

select *
from default.vw_pin_value as value
left join default.vw_pin_history as history
    on value.pin = history.pin
    and cast(value.year as int) = cast(history.year as int) - 1
where value.mailed_tot != history.oneyr_pri_mailed_tot
    or value.certified_tot != history.oneyr_pri_certified_tot
    or value.board_tot != history.oneyr_pri_board_tot

but it doesn't really fit into our current generics. She offered to update row_values_match_after_join to make it able to handle something like this, or I could write a specific test that doesn't depend on a generic. Thoughts @dfsnow ?

dfsnow commented 3 months ago

Let's see some dbt tests in here

I've added some for vw_pin_value. For vw_pin_history it's a little difficult since it's primarily a product of vw_pin_history. I spoke with @jeancochrane about implementing a test like:

select *
from default.vw_pin_value as value
left join default.vw_pin_history as history
    on value.pin = history.pin
    and cast(value.year as int) = cast(history.year as int) - 1
where value.mailed_tot != history.oneyr_pri_mailed_tot
    or value.certified_tot != history.oneyr_pri_certified_tot
    or value.board_tot != history.oneyr_pri_board_tot

but it doesn't really fit into our current generics. She offered to update row_values_match_after_join to make it able to handle something like this, or I could write a specific test that doesn't depend on a generic. Thoughts @dfsnow ?

Eh, I think what you've added so far is probably enough.