PHSKC-APDE / rads.data

Data for rads and general PHSKC analyses
1 stars 0 forks source link

Potential problems with spatial_zip_to_hra20_geog #37

Open dcolombara opened 1 month ago

dcolombara commented 1 month ago

spatial_zip_to_hra20_geog might have issues.

Kai IM'd me to ask what xwalk he should use to identify ZIP codes in KC. I suggested spatial_zip_to_hra20_pop or spatial_zip_to_hra20_geog.

He wrote back and said there were some zips outside of KC. I thought they would be those that adjoin KC (e.g., 98026 where Edmonds touches Shoreline and gives us a tiny s2t_fraction). However, I took the time to map it out and it looks like there might be some outside of KC (run the code to see them on a map).

It's possible that I screwed up the shapefile processing, but can you can a look and make sure that our ZIP crosswalks are legit? The attached code should run fine on your computer and only take a minute.

library(rads.data)
library(data.table)
library(sf)
library(ggplot2)

kczips <- unique(spatial_zip_to_hra20_geog[]$ZIP)

hra20shape <- st_read("//dphcifs/APDE-CDIP/Shapefiles/HRA/hra_2020_nowater.shp")
plot(hra20shape["id"])

zipshape <- st_read("//dphcifs/APDE-CDIP/Shapefiles_protected/ZIP/adci_wa_zip_raw.shp")
zipshape <- zipshape[zipshape$POSTCOD %in% unique(kczips), ]
zipshape <- st_transform(zipshape, st_crs(hra20shape))
plot(zipshape["POSTCOD"])

# Identify POSTCOD areas that are outside hra20shape
intersection <- st_intersects(zipshape, hra20shape)
outer_zipcodes <- zipshape[lengths(intersection) == 0, ]

# check on zipcodes that appear to be outside of HRAs
setorder(spatial_zip_to_hra20_geog[ZIP %in% outer_zipcodes$POSTCOD, .(ZIP, hra20_name, s2t_fraction)], ZIP)[]

# Plot data
ggplot() +
  geom_sf(data = zipshape, aes(fill = POSTCOD), color = NA) +

  geom_sf_text(data = outer_zipcodes, aes(label = POSTCOD), size = 3, check_overlap = TRUE) +

  geom_sf(data = hra20shape, fill = NA, color = "black", linewidth = 1.5) +

  theme_void() +
  theme(legend.position = "none", plot.title = element_text(hjust = 0.5)) +
  labs(title = "HRA20 Borders with Zip Code Overlay",
       fill = "ZIP Code")
dcaseykc commented 1 month ago

The difference probably comes from using a different ZIP shapefile ("/APDE-CDIP/Shapefiles/ZIP/zipcode.shp" is used instead of Shapefiles_protected/ZIP/adci_wa_zip_raw.shp). We could/should screen out the small geographic intersections, but I don't think it should hurt any computation at the moment.

dcolombara commented 4 weeks ago

Would you be able to update the crosswalk to use the better ZIP code map? It's not urgent, but I don't want to forget about it.

I took this opportunity to play a little with leaflet. I don't love the fact that our crosswalk currently includes ZIP codes that are not even close to KC. I think this is because the other shapefile you used is wonky (as you suggested). For example, it has much bigger ZIP code areas (because it includes the surrounding water, even though no one lives in the middle of Puget Sound).

# load packages ----
library(sf)
library(spatagg)
library(leaflet)

# load shapefiles
newhra = st_read('//dphcifs/APDE-CDIP/Shapefiles/HRA/hra_2020.shp')
newhra = newhra[, c('id', 'name', 'reg_id', 'reg_nm')]
newhra <- st_transform(newhra, 4326)

zip <- st_read("//dphcifs/APDE-CDIP/Shapefiles_protected/ZIP/adci_wa_zip_raw.shp")
names(zip)[names(zip) == 'POSTCOD'] <- 'ZIP'
zip <- st_transform(zip, 4326)

kcshape <- st_read("//Kcitfsrprpgdw01/kclib/Plibrary2/politicl/shapes/polygon/kingco.shp")
kcshape <- st_transform(kcshape, 4326)

# ZIP to HRA crosswalk
z2h_fo = create_xwalk(source = zip, target = newhra, source_id = 'ZIP', target_id = 'id', min_overlap = 0, method = "fractional overlap")

# Subset zip shapefile to zipcodes crosswalked by spatagg using the clean zipcode data
xwalkedZIP <- unique(z2h_fo$source_id)
xwalkedZIPshape <- zip[zip$ZIP %in% xwalkedZIP, ]

# plot shows that we selected the correct zipcodes
leaflet() %>%
  addTiles() %>%  
  addPolygons(data = xwalkedZIPshape, 
              color = "white", 
              weight = 2, 
              fillOpacity = 0) %>%
  addPolygons(data = kcshape,  
              color = "blue", 
              weight = 2,     
              fillOpacity = 0.2,  
              fillColor = "blue") %>%  
  addProviderTiles(providers$Esri.WorldImagery)

# plot showing some zipcodes currently crosswalked in rads.data are entirely outside of King County
currentZIP <- unique(rads.data::spatial_zip_to_hra20_geog[]$ZIP)
currentzipshape <- zip[zip$ZIP %in% currentZIP, ]
leaflet() %>%
  addTiles() %>%  
  addPolygons(data = currentzipshape, 
              color = "white", 
              weight = 2, 
              fillOpacity = 0) %>%
  addPolygons(data = kcshape,  
              color = "blue", 
              weight = 2,     
              fillOpacity = 0.2,  
              fillColor = "blue") %>%  
  addProviderTiles(providers$Esri.WorldImagery)