Added steps to 1_crosswalk_fetch and 2_crosswalk_munge to generate the target `, which is a two-column table ofsite_idandstate(the state abbreviation) indicating which state each site_id is in. Sites can be in multiple states, which would be represented by a separate row (so the site_id column represents non-unique site ids). I used a polygon-to-polygon intersection and any amount of intersection between a lake polygon and a state polygon. If thestatecolumn has anNA`, this means it was not matched to any states (so likely Canada). There were 217 lakes the crossed multiple states and X lakes that were not matched to any state.
This fixes some of #271 and implements a need for #276 to use only MN sites. Tagging @padilla410 and @hcorson-dosch for awareness.
library(scipiper)
library(tidyverse)
lake_centroids <- readRDS(sc_retrieve('2_crosswalk_munge/out/centroid_lakes_sf.rds.ind'))
xwalk <- readRDS(sc_retrieve('2_crosswalk_munge/out/lake_to_state_xwalk.rds.ind'))
head(xwalk)
# A tibble: 6 x 2
site_id state
<chr> <chr>
1 nhdhr_114337087 MN
2 nhdhr_81804327 MT
3 nhdhr_121544138 MN
4 nhdhr_121544472 MN
5 nhdhr_121545089 MN
6 nhdhr_122546423 SD
sum(duplicated(xwalk$site_id)) # 223 sites (0.27%) that crossed states
sum(is.na(xwalk$state)) # 10,918 sites (13%) that did not intersect any state boundaries
lake_centroids %>%
left_join(xwalk, by = "site_id") %>%
ggplot(aes(color = state)) + geom_sf()
Added steps to
1_crosswalk_fetch
and2_crosswalk_munge
to generate the target`, which is a two-column table of
site_idand
state(the state abbreviation) indicating which state each site_id is in. Sites can be in multiple states, which would be represented by a separate row (so the site_id column represents non-unique site ids). I used a polygon-to-polygon intersection and any amount of intersection between a lake polygon and a state polygon. If the
statecolumn has an
NA`, this means it was not matched to any states (so likely Canada). There were 217 lakes the crossed multiple states and X lakes that were not matched to any state.This fixes some of #271 and implements a need for #276 to use only MN sites. Tagging @padilla410 and @hcorson-dosch for awareness.