UI-Research / mobility-from-poverty

https://ui-research.github.io/mobility-from-poverty/
4 stars 1 forks source link

Iss202 - Housing Affordability #310

Closed rogin123 closed 3 months ago

rogin123 commented 3 months ago

create new branch for Rob to review

rpitingolo commented 3 months ago

OK so we have a problem where the metric for affordability at 80 AMI is lower than at 50 AMI in a substantial number of counties. I ran a check with this code:

check <- available_2022_overall %>%
  mutate(flag_30_gt_50 = if_else(share_affordable_available_30_ami > share_affordable_available_50_ami, 1, 0),
         flag_50_gt_80 = if_else(share_affordable_available_50_ami > share_affordable_available_0_ami, 1, 0))

And then number of flag_50_gt_80 make up the majority of cases, so that doesn't pass the smell test.

I tried to trace this back through the code and couldn't the issue but when I checked the data itself I think I found it there. I can't run the code straight through because it breaks on this line:

households_2022 <- read_csv(here::here("02_housing/data/temp/households_2022_county.csv"))

I am able to run it using the households_2022_county.csv from the root folder (not the 02_housing subfolder) but I think that is an outdated file and the reason why we eventually wind up with the issue I originally started with.

Since these scripts rely heavily on reading and writing CSVs we need to make sure all the CSVs are in the correct place and any old or outdated CSVs are deleted. I should be able to press "run all chunks" and it runs through without breaking.

@rogin123 can you please make sure the correct CSVs are in the correct places, delete any outdated ones, and make sure all committs are pushed? Can you also run that QC code I pasted above on both the place and county data to make sure it looks OK? There are a small number of cases on the place data where it's flagging but we can check that separately as it might be legitimate.

@cdsolari once this is resolved I can re-check the metrics.

Thanks!

rpitingolo commented 3 months ago

I re-ran all of the scripts from the 0_ and it seems to have fixed some of the issues I previously had. When I ran that QC code flag_30_gt_50 and flag_50_gt_80 are less than 2% on the county file (including missing) and less than 0.5% on the place file. The county file has a lot of NA but I think is the nature of the county data.

@cdsolari I pushed the most up-to-date CSVs (from this morning) to this branch.

@rogin123 standby for next steps.

rpitingolo commented 3 months ago

I've spot checked both place and county CSVs and don't see any more major issues, so I am going to approve this now.