Closed malcalakovalski closed 3 months ago
I've completed a second review. Because IPUMS is down, I cannot run the full file, especially to verify changes to code to pull microdata that were made since my last review. I have requested some additional changes on CT counties in 2022, final formatting of files, data quality flags, and "missing" ratio values.
The final adjustments needed are:
Before I approve, wanted to confirm the following and may need @cdsolari or @awunderground to weigh in based on treatment of other metrics:
Everything else looks good!
@kmartinchek @malcalakovalski Ideally, in the subgroup files, there should be placeholder rows for every subgroup, and not just "All." I'll let @awunderground weigh in on the importance of this over finishing though, given our timeline.
For Alaska, Valdez-Cordova Census Area, Alaska (02-261), Split to form Chugach Census Area (02-063) and Copper River Census Area (02-066) effective January 02, 2019, as noted in this documentation. Careful that the population variable in the county-population.csv crosswalk has NA's for all of AK's counties for some reason. But, Valdez-Cordova should indeed cease to exist after 2019, and the other two counties should have replaced it thereafter.
For South Fulton, GA (13-72122), I see it in our place-populations crosswalk file. The population size is between 97 and 112k. It should technically be available. I also looked up some census quick tables, and it looks like the share of owner-occupied housing units is 69%, so I'd expect to see data for this metric for that place....
Thanks @cdsolari for clarifying. I think the issue is that Valdez-Cordova CA is NA for 2014-2019 (even though it exists), before it disappears. Is that right?
@kmartinchek Yes, that does seem weird that they are all NA for Valdez-Cordova. It could have been an issue in 2014 being NA if you used the population figure in the calculation, but it doesn't explain why the others are also NA. Maybe a merging issue? Makes sure the code is bysort year.
@cdsolari @malcalakovalski Great, thank you for clarifying! Seems like we should expect values for this county-equivalent for 2015-2018 (if not 2014 and 2019).
@kmartinchek @malcalakovalski I think we should expect values for 2014-2019. If you don't have it for 2014, just make sure you don't use the population count from the crosswalk ;). It might be something funny with the merge. THANKS!
@kmartinchek @cdsolari It looks like Valdez-Cordova CA isn't present in the puma to county crosswalk (geographic-crosswalks/data/crosswalk_puma_to_county.csv
). Therefore, we can't calculate metrics on it because there are no weights. However, since it is present in the county population file, it shows up as NA in the final data.
A similar issue occurs in the puma to places crosswalk. In particular, South Fulton city, GA is only present in the crosswalk period = 2022
. @cdsolari how do you suggest we proceed?
This is exactly right. We needed a longitudinal file that represents geographies at a point in time since we need county drop downs that represent current counties. Valdez-Cordova, Alaska should be missing in those earlier years.
So
My earlier comment was imprecise. @malcalakovalski, thank you for the precise language!
It looks like Valdez-Cordova CA isn't present in the puma to county crosswalk (geographic-crosswalks/data/crosswalk_puma_to_county.csv). Therefore, we can't calculate metrics on it because there are no weights. However, since it is present in the county population file, it shows up as NA in the final data.
Your approach seems correct. It's perfectly fine for those earlier values to be missing. Just report the 3,142 counties with the NA
s.
Note for posterity: The code I wrote to add one missing row for each subgroup is quite hacky and hard to edit. In the future, I think it will be worth investing some time in refactoring it.
Ratio of the share of a community’s housing wealth held by a racial or ethnic group to the share of households of the same group
Note: I get 3,146 counties in 2022 for this metric instead of 3,143. Extra reviewer attention on the crosswalk section would be extremely helpful to debug this.