USGS-R / drb-gw-hw-model-prep

Code repo to prepare groundwater and headwater-related datasets for modeling river temperature in the Delaware River Basin
Other
0 stars 3 forks source link

Standardize NHM segment identifiers across catchment attribute targets #53

Closed lekoenig closed 1 year ago

lekoenig commented 2 years ago

The temperature data release includes two segment identifier columns, subsegid and seg_id_nat. There are 459 unique values of subsegid in the DRB and 456 unique values of seg_id_nat. (The difference arises because segidnat's 1437, 1442, 1485 were split during processing for the temperature project. This step is in the delaware-model-prep repo.)

When processing catchment attributes, we're sometimes using one identifier column and sometimes using the other (examples included below). We should decide how many segments we're expecting in the network, and whether we should use subsegid (sometimes referred to in our pipeline as PRMS_segid because of naming conventions used in the inland salinity project) or seg_id_nat to represent unique segments for modeling.

# 1) Here's an example where we use seg_id_nat and therefore end up with 456 segments:
> tar_load(p2_confinement_mcmanamay_filled)
> dim(p2_confinement_mcmanamay_filled)
[1] 456   7
> head(p2_confinement_mcmanamay_filled, 3)
# A tibble: 3 x 7
  seg_id_nat reach_length_km lengthkm_mcmanamay_is_na prop_reach_w_mcmanamay confinement_calc_mcmanamay flag_mcmanamay flag_gaps
  <chr>                <dbl>                    <dbl>                  <dbl>                      <dbl> <chr>          <chr>    
1 1435                  13.6                    0                      1                          12.8  NA             NA       
2 1436                  19.1                    0.518                  0.973                      13.9  NA             NA       
3 1437                  19.6                    0                      1                           8.98 NA             NA  

#2) Here's an example where we use subsegid/PRMS_segid and therefore end up with 459 segments:
> tar_load(p2_soller_coarse_sediment_reaches_nhm)
> dim(p2_soller_coarse_sediment_reaches_nhm)
[1] 459   5
> head(sf::st_drop_geometry(p2_soller_coarse_sediment_reaches_nhm), 3)
# A tibble: 3 x 4
  PRMS_segid total_reach_buffer_area_km2 cs_area_km2 cs_area_proportion
  <chr>                           [km^2]      [km^2]              <dbl>
1 1_1                               6.94           0                  0
2 10_1                              1.24           0                  0
3 11_1                              1.10           0                  0
> 
lekoenig commented 2 years ago

From Janet:

We have 455 segments with distinct seg_id_nat's in the existing river-dl input files, so let's aggregate to the segidnat