UI-Research / mobility-from-poverty

https://ui-research.github.io/mobility-from-poverty/
4 stars 1 forks source link

Iss224 -- home values #284

Closed malcalakovalski closed 3 months ago

malcalakovalski commented 4 months ago

Ratio of the share of a community’s housing wealth held by a racial or ethnic group to the share of households of the same group

Note: I get 3,146 counties in 2022 for this metric instead of 3,143. Extra reviewer attention on the crosswalk section would be extremely helpful to debug this.

kmartinchek commented 3 months ago

I've completed a second review. Because IPUMS is down, I cannot run the full file, especially to verify changes to code to pull microdata that were made since my last review. I have requested some additional changes on CT counties in 2022, final formatting of files, data quality flags, and "missing" ratio values.

kmartinchek commented 3 months ago

The final adjustments needed are:

  1. there need to be rows for all 9 CT planning regions in 2022 file and longitudinal overall and subgroup county files.
  2. Clarify if 2014/2015 data will be included for counties, as places these years were removed.
  3. Clarify missingness in 2020 place and county files-- should this year be removed/filtered out?
kmartinchek commented 3 months ago

Before I approve, wanted to confirm the following and may need @cdsolari or @awunderground to weigh in based on treatment of other metrics:

  1. Is the Valdez-Cordova Census Area in Alaska existing before 2020? It seems that 2014-2019 are NAs. If it isn't a county-equal during those periods, should it be dropped?
  2. For places: it seems that South Fulton City is missing data for 2018, 2019, and 2020; although it was incorporated in 2017. Is that expected?
  3. When a place/county is missing for a specific year in the subgroup files, they just have 1 NA row (vs. one for overall, over + under 45). Is this the expected file format?

Everything else looks good!

cdsolari commented 3 months ago

@kmartinchek @malcalakovalski Ideally, in the subgroup files, there should be placeholder rows for every subgroup, and not just "All." I'll let @awunderground weigh in on the importance of this over finishing though, given our timeline.

For Alaska, Valdez-Cordova Census Area, Alaska (02-261), Split to form Chugach Census Area (02-063) and Copper River Census Area (02-066) effective January 02, 2019, as noted in this documentation. Careful that the population variable in the county-population.csv crosswalk has NA's for all of AK's counties for some reason. But, Valdez-Cordova should indeed cease to exist after 2019, and the other two counties should have replaced it thereafter.

For South Fulton, GA (13-72122), I see it in our place-populations crosswalk file. The population size is between 97 and 112k. It should technically be available. I also looked up some census quick tables, and it looks like the share of owner-occupied housing units is 69%, so I'd expect to see data for this metric for that place....

kmartinchek commented 3 months ago

Thanks @cdsolari for clarifying. I think the issue is that Valdez-Cordova CA is NA for 2014-2019 (even though it exists), before it disappears. Is that right? image

cdsolari commented 3 months ago

@kmartinchek Yes, that does seem weird that they are all NA for Valdez-Cordova. It could have been an issue in 2014 being NA if you used the population figure in the calculation, but it doesn't explain why the others are also NA. Maybe a merging issue? Makes sure the code is bysort year.

kmartinchek commented 3 months ago

@cdsolari @malcalakovalski Great, thank you for clarifying! Seems like we should expect values for this county-equivalent for 2015-2018 (if not 2014 and 2019).

cdsolari commented 3 months ago

@kmartinchek @malcalakovalski I think we should expect values for 2014-2019. If you don't have it for 2014, just make sure you don't use the population count from the crosswalk ;). It might be something funny with the merge. THANKS!

malcalakovalski commented 3 months ago

@kmartinchek @cdsolari It looks like Valdez-Cordova CA isn't present in the puma to county crosswalk (geographic-crosswalks/data/crosswalk_puma_to_county.csv). Therefore, we can't calculate metrics on it because there are no weights. However, since it is present in the county population file, it shows up as NA in the final data.

A similar issue occurs in the puma to places crosswalk. In particular, South Fulton city, GA is only present in the crosswalk period = 2022. @cdsolari how do you suggest we proceed?

awunderground commented 3 months ago

This is exactly right. We needed a longitudinal file that represents geographies at a point in time since we need county drop downs that represent current counties. Valdez-Cordova, Alaska should be missing in those earlier years.

So

awunderground commented 3 months ago

My earlier comment was imprecise. @malcalakovalski, thank you for the precise language!

It looks like Valdez-Cordova CA isn't present in the puma to county crosswalk (geographic-crosswalks/data/crosswalk_puma_to_county.csv). Therefore, we can't calculate metrics on it because there are no weights. However, since it is present in the county population file, it shows up as NA in the final data.

Your approach seems correct. It's perfectly fine for those earlier values to be missing. Just report the 3,142 counties with the NAs.

malcalakovalski commented 3 months ago

Note for posterity: The code I wrote to add one missing row for each subgroup is quite hacky and hard to edit. In the future, I think it will be worth investing some time in refactoring it.