Closed wrridgeway closed 1 month ago
I ran the ingest stage of the condo model pipeline with both model.vw_pin_condo_input
(old) and z_ci_152_determine_parking_spacecommon_area_flag_hierarchy_model.vw_pin_condo_input
(new).
# ingest new and old training data
> new <- read_parquet("input_new/training_data.parquet") %>%
filter(meta_modeling_group == "NONLIVABLE")
> old <- read_parquet("input/training_data.parquet") %>%
filter(meta_modeling_group == "NONLIVABLE")
# filter only sales from training data that are no longer considered common area
> changed <- old %>%
filter(!(meta_pin %in% new$meta_pin))
> nrow(changed)
[1] 9
> summary(changed$meta_sale_price)
Min. 1st Qu. Median Mean 3rd Qu. Max.
100000 163000 315523 305444 350000 723500
It seems like it was probably pretty silly to even consider these common area in the first place based on their sale prices. But it's a very small number of parcels that changes in the training data, regardless.
> new <- read_parquet("input_new/assessment_data.parquet") %>%
filter(meta_modeling_group == "NONLIVABLE")
> old <- read_parquet("input/assessment_data.parquet") %>%
filter(meta_modeling_group == "NONLIVABLE")
> changed <- old %>%
filter(!(meta_pin %in% new$meta_pin))
> nrow(changed)
[1] 223
> length(unique(changed$meta_pin10))
[1] 51
So we've got 223 units from 51 different buildings that were previously considered NONLIVABLE
that are now treated as normal condo units for assessment. 131 of these 223 units have a non-null value for char_bedroom
, which probably should have given us cause for concern in the past either about the characteristics for these units or their status as common areas.
A very small number of parcels is affected by this change.
131 of these 223 units have a non-null value for char_bedroom, which probably should have given us cause for concern in the past either about the characteristics for these units or their status as common areas.
@wrridgeway Can we add a dbt data test that throws an error if a parking space or common area also has characteristics?
Done, but it's not particularly encouraging.
I was going to handle this within the modeling pipeline, but it needs to be addressed here as well.