DOI-USGS / lake-temperature-model-prep

Pipeline #1
Other
6 stars 13 forks source link

Add a lakes to state xwalk #278

Closed lindsayplatt closed 2 years ago

lindsayplatt commented 2 years ago

Added steps to 1_crosswalk_fetch and 2_crosswalk_munge to generate the target `, which is a two-column table ofsite_idandstate(the state abbreviation) indicating which state each site_id is in. Sites can be in multiple states, which would be represented by a separate row (so the site_id column represents non-unique site ids). I used a polygon-to-polygon intersection and any amount of intersection between a lake polygon and a state polygon. If thestatecolumn has anNA`, this means it was not matched to any states (so likely Canada). There were 217 lakes the crossed multiple states and X lakes that were not matched to any state.

This fixes some of #271 and implements a need for #276 to use only MN sites. Tagging @padilla410 and @hcorson-dosch for awareness.

library(scipiper)
library(tidyverse)

lake_centroids <- readRDS(sc_retrieve('2_crosswalk_munge/out/centroid_lakes_sf.rds.ind'))
xwalk <- readRDS(sc_retrieve('2_crosswalk_munge/out/lake_to_state_xwalk.rds.ind'))

head(xwalk)
# A tibble: 6 x 2
  site_id         state
  <chr>           <chr>
1 nhdhr_114337087 MN   
2 nhdhr_81804327  MT   
3 nhdhr_121544138 MN   
4 nhdhr_121544472 MN   
5 nhdhr_121545089 MN   
6 nhdhr_122546423 SD   

sum(duplicated(xwalk$site_id)) # 223 sites (0.27%) that crossed states
sum(is.na(xwalk$state)) # 10,918 sites (13%) that did not intersect any state boundaries

lake_centroids %>% 
  left_join(xwalk, by = "site_id") %>% 
  ggplot(aes(color = state)) + geom_sf()

image