USGS-R / drb-gw-hw-model-prep

Code repo to prepare groundwater and headwater-related datasets for modeling river temperature in the Delaware River Basin
Other
0 stars 3 forks source link

Compile channel confinement data #41

Closed lekoenig closed 1 year ago

lekoenig commented 2 years ago

Options include the FACET dataset (Hopkins et al. 2020) which is DRB-specific, and the McManamay and DeRolph national dataset which is indexed by COMID.

lekoenig commented 1 year ago

The geomorphometric dataset from FACET is appealing because it very likely provides more accurate values for channel width, floodplain width, and thus, channel confinement. The dataset was derived from 3-m DEMs, whereas river widths in the McManamay and DeRolph dataset are generated from random forest models.

However, there are a couple of potential issues with FACET that might make it more difficult to use here:

janetrbarclay commented 1 year ago

Hmm, that's a tough one. I like the idea of using the DEM data, but river-dl needs to have an input value for each segment, so missing data becomes a problem. If we use the FACET data we would need to figure out how to estimate the missing values (not sure if just the mean is reasonable probably not). I suspect we wouldn't have much issue with the low end of catchment size at the NHM resolution, but we definitely would at the upper end. Do we have a dataset that has drainage area for each reach? Seems like something we should have but I'm not sure where.

Janet


Janet Barclay U.S. Geological Survey New England Water Science Center Connecticut Office 101 Pitkin St. East Hartford, CT 06108

Phone (office) 860 291-6763 Fax 860 291-6799 Email @.**@*.**@*.***> https://www.usgs.gov/staff-profiles/janet-barclay


From: Lauren Koenig @.> Sent: Friday, October 21, 2022 11:50 AM To: USGS-R/drb-gw-hw-model-prep @.> Cc: Subscribed @.***> Subject: [EXTERNAL] Re: [USGS-R/drb-gw-hw-model-prep] Compile channel confinement data (Issue #41)

This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.

The geomorphometric dataset from FACET is appealing because it very likely provides more accurate values for channel width, floodplain width, and thus, channel confinement. The dataset was derived from 3-m DEMs, whereas river widths in the McManamay and DeRolph dataset are generated from random forest models.

However, there are a couple of potential issues with FACET that might make it more difficult to use here:

— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSGS-R%2Fdrb-gw-hw-model-prep%2Fissues%2F41%23issuecomment-1287147712&data=05%7C01%7Cjbarclay%40usgs.gov%7C0ba9e2669a354c5ef94f08dab37c07ef%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019642562407705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2q7%2F3PzqWkGrcXH5wsSNTtsxy94xHHP1W1ODsiAeItE%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA5H7UA3RI6JFLLZBL6KHYDWEK3VBANCNFSM6AAAAAAQONUQJU&data=05%7C01%7Cjbarclay%40usgs.gov%7C0ba9e2669a354c5ef94f08dab37c07ef%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019642562407705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8v38kz%2FzZ6RXkDrgOkm76EaHbcV5mCbb8lpGXCFerBc%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

lekoenig commented 1 year ago

Here's the distribution of watershed area's for the NHM segments (taking the totdasqkm column from the NHD value-added attributes for those COMIDs located at the bottom of each NHM segment):

> tar_load(p1_drb_comids_down)
> tar_load(p1_nhd_reaches)
> 
> p1_nhd_reaches %>%
+     filter(comid %in% p1_drb_comids_down$COMID) %>%
+     pull(totdasqkm) %>%
+     quantile()
        0%        25%        50%        75%       100% 
    0.4950   109.2933   228.0258   766.3649 30744.3168 
>

It looks like ~15% of the flowlines would be outside of the watershed area range given in the FACET metadata, so yeah we would need to impute channel confinement values for those segments:

> p1_nhd_reaches %>%
+   filter(comid %in% p1_drb_comids_down$COMID) %>%
+   filter(totdasqkm <3 | totdasqkm > 3000) %>%
+   sf::st_drop_geometry() %>%
+   summarize(proportion_out_of_bounds = nrow(.)/nrow(p1_drb_comids_down))
# A tibble: 1 x 1
  proportion_out_of_bounds
                     <dbl>
1                    0.155
> 
janetrbarclay commented 1 year ago

Is there any correlation with channel confinement and watershed size? I might imagine in the DRB that small catchments tend to be more confined and large ones less so.


Janet Barclay U.S. Geological Survey New England Water Science Center Connecticut Office 101 Pitkin St. East Hartford, CT 06108

Phone (office) 860 291-6763 Fax 860 291-6799 Email @.**@*.**@*.***> https://www.usgs.gov/staff-profiles/janet-barclay


From: Lauren Koenig @.> Sent: Friday, October 21, 2022 12:24 PM To: USGS-R/drb-gw-hw-model-prep @.> Cc: Barclay, Janet R @.>; Comment @.> Subject: [EXTERNAL] Re: [USGS-R/drb-gw-hw-model-prep] Compile channel confinement data (Issue #41)

This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.

Here's the distribution of watershed area's for the NHM segments (taking the totdasqkm column from the NHD value-added attributes for those COMIDs located at the bottom of each NHM segment):

tar_load(p1_drb_comids_down) tar_load(p1_nhd_reaches)

p1_nhd_reaches %>%

  • filter(comid %in% p1_drb_comids_down$COMID) %>%
  • pull(totdasqkm) %>%
  • quantile() 0% 25% 50% 75% 100% 0.4950 109.2933 228.0258 766.3649 30744.3168

It looks like ~15% of the flowlines would be outside of the watershed area range given in the FACET metadata, so yeah we would need to impute channel confinement values for those segments:

p1_nhd_reaches %>%

  • filter(comid %in% p1_drb_comids_down$COMID) %>%
  • filter(totdasqkm <3 | totdasqkm > 3000) %>%
  • sf::st_drop_geometry() %>%
  • summarize(proportion_out_of_bounds = nrow(.)/nrow(p1_drb_comids_down))

    A tibble: 1 x 1

    proportion_out_of_bounds

    1 0.155

— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSGS-R%2Fdrb-gw-hw-model-prep%2Fissues%2F41%23issuecomment-1287182514&data=05%7C01%7Cjbarclay%40usgs.gov%7C9f1452cf5f79445ba7ac08dab380b874%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019662676021091%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=F2RTZsKiN0XCwoMqOlSp9FrdQdU5D1LH2aBlYmfZmn0%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA5H7UENWX72NK6GTNBML7LWEK7TNANCNFSM6AAAAAAQONUQJU&data=05%7C01%7Cjbarclay%40usgs.gov%7C9f1452cf5f79445ba7ac08dab380b874%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019662676021091%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SufNSMQud0M1QlL0b24umOwPIlb%2FbFSyu8RrlFyIFkw%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

lekoenig commented 1 year ago

Is there any correlation with channel confinement and watershed size?

That expectation does seem reasonable, although it seems like there's enough variability in the DRB that might make it difficult to impute confinement based on watershed size alone(?):

plot1

plot2

janetrbarclay commented 1 year ago

Is the lower plot by stream order? From the scatter plot it seems there isn't much variability within the big watersheds, right? Seems like we might be able to use a mean value across big rivers for the big ones? Not sure what to do with the small ones. How many do we have below the 3 km2 threshold?

lekoenig commented 1 year ago

Oops, yes - the lower plot is separated by stream order. There are 2 NHM segments below the 3 km2 threshold, so perhaps those could also be imputed with some mean value.

> p1_nhd_reaches %>%
+   filter(comid %in% p1_drb_comids_down$COMID) %>%
+   filter(totdasqkm <3) %>%
+   sf::st_drop_geometry() %>% 
+   pull(totdasqkm)
[1] 2.6199 0.4950
>
janetrbarclay commented 1 year ago

Could use some mean value or since it's only 2 could pull the nearest downstream reach or something like that (assuming some spatial correlation in the values)


Janet Barclay U.S. Geological Survey New England Water Science Center Connecticut Office 101 Pitkin St. East Hartford, CT 06108

Phone (office) 860 291-6763 Fax 860 291-6799 Email @.**@*.**@*.***> https://www.usgs.gov/staff-profiles/janet-barclay


From: Lauren Koenig @.> Sent: Friday, October 21, 2022 1:23 PM To: USGS-R/drb-gw-hw-model-prep @.> Cc: Barclay, Janet R @.>; Comment @.> Subject: [EXTERNAL] Re: [USGS-R/drb-gw-hw-model-prep] Compile channel confinement data (Issue #41)

This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.

Oops, yes - the lower plot is separated by stream order. There are 2 NHM segments below the 3 km2 threshold, so perhaps those could also be imputed with some mean value.

p1_nhd_reaches %>%

  • filter(comid %in% p1_drb_comids_down$COMID) %>%
  • filter(totdasqkm <3) %>%
  • sf::st_drop_geometry() %>%
  • pull(totdasqkm) [1] 2.6199 0.4950

— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSGS-R%2Fdrb-gw-hw-model-prep%2Fissues%2F41%23issuecomment-1287241966&data=05%7C01%7Cjbarclay%40usgs.gov%7Cdea854d255814670fcfa08dab388f1e5%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019698001211111%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nL%2FtWsGX2s8WDlpDrSxeGCgWLBnpTUQWr%2BnYq9Inq8s%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA5H7UGXGVX6OHXGOPDXJHLWELGP3ANCNFSM6AAAAAAQONUQJU&data=05%7C01%7Cjbarclay%40usgs.gov%7Cdea854d255814670fcfa08dab388f1e5%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019698001211111%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GrrY%2FKI28ML91xbEbFrbVX2Oiwso%2BAp3DhqMHbVarJU%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

lekoenig commented 1 year ago

The McManamay and DeRolph dataset includes Confinement as a categorial value, but we can also calculate floodplain width/channel width using the other columns in their dataset. Here's a comparison of the calculated confinement values between the two datasets (without any additional spatial aggregation on our part). This isn't a one-to-one comparison, but rather, just showing the range of values within those reaches assigned to a given stream order:

comparison_plot

The McManamay dataset includes several high outliers that are driven by very small values for width (which were estimated using a RF algorithm at the national scale). Those high values are not likely to show up in the NHM-aggregated data. Below are the same McManamay estimates once I've aggregated the values from NHDPlusv2 to NHM:

mcmanamay_boxplot

If I can join the FACET values to the NHM flowlines, I could directly compare these values.

lekoenig commented 1 year ago

In terms of aggregating the FACET values to the NHM flowlines, one idea would just be to replicate the workflow from a recent paper from this group:

Because FACET generates a higher resolution stream network than NHDPlusV2, the FACET stream segment with the largest Shreve magnitude within each NHDPlusV2 catchment was selected and then statistically summarized for each geomorphometric metric as the mean of the 5th to 95th percentile values (to exclude anomalous outliers) for the following attributes: streambank height, stream channel width, average streambank angle, and stream slope and sinuosity; total active floodplain width was summarized as the mean in each reach.

So instead of creating a crosswalk between the segment flowlines, just do a spatial join with the FACET flowlines and the catchments that comprise the NHM catchments, and summarize floodplain width and channel width as above.

janetrbarclay commented 1 year ago

From that first figure it would seem the two datasets provide pretty similar results (at least similar ranges) for 4th order and above, but diverge a good bit below that. Any thoughts on why the two datasets diverge in the small streams?

To make sure I have this straight, McManamay is national in scope and uses a RF for the stream reach attributes whereas FACET is specific to the DRB, uses a DEM for channel width and a regression for floodplain width?

I think it's interesting that McManamay has more variation overall than FACET. I think I might have expected the opposite (that the national dataset and the RF would smooth out some of the outliers).

janetrbarclay commented 1 year ago

the FACET stream segment with the largest Shreve magnitude

what's a Shreve magnitude?

lekoenig commented 1 year ago

To make sure I have this straight, McManamay is national in scope and uses a RF for the stream reach attributes whereas FACET is specific to the DRB, uses a DEM for channel width and a regression for floodplain width?

Thanks, I think you've got it but I want to record a few details here for future reference. (And Shreve magnitude is just another stream numbering scheme, alternative to Strahler order.)

McManamay

FACET

lekoenig commented 1 year ago

So instead of creating a crosswalk between the segment flowlines, just do a spatial join with the FACET flowlines and the catchments that comprise the NHM catchments, and summarize floodplain width and channel width as above.

For the FACET dataset, I followed the procedure outlined in Noe et al. 2022 (linked above) to aggregate the geomorphic metrics from the high-resolution FACET network to the NHDPlusv2 network. Briefly, this consisted of doing a spatial join between the FACET network and NHDPlusv2 catchment polygons, and then identifying which FACET segment was most downstream within that catchment. We assume the values from the ID'd FACET segment also apply to the paired NHDPlusv2 flowline for that catchment. Here's a comparison of estimated confinement at the NHD-scale for both FACET and the McManamay and DeRolph datasets:

compare_confinement_nhd

The overall strength of the correlation is fairly poor as there's a lot of scatter around the values we would estimate from the McManamay channel/floodplain widths. However, the central tendency indicates some agreement between the two data sources, so that is encouraging.

I've also taken a stab at aggregating the FACET-derived values to the NHM-scale by calculating a length-weighted average channel width and floodplain width, respectively, based on the COMIDs that comprise each NHM segment. Using a length-weighted mean results in 75 NA values and 381 confinement estimates across segments within the DRB network. So we would still have to impute those 75 segments somehow. I also want to point out that for the segments where we can come up with a FACET-derived confinement estimate, the length of the NHM segment that overlaps COMIDs with FACET data is variable. I currently flag segments where <70% of the length overlaps COMIDs with FACET data.

FACET_hist

janetrbarclay commented 1 year ago

Thanks, @lekoenig, for doing all this. I look forward to chatting details and next steps on Tuesday.

Janet

lekoenig commented 1 year ago

Since our conversation on Tuesday (11/1/2022) I've made a few edits to how I was processing both the McManamay and FACET datasets. Namely, for each dataset I now compile the input values and calculate channel confinement (~floodplain width/river width) at the NHDPlusv2-scale. Then, to aggregate to the NHM-scale, I calculate a length-weighted mean of the confinement values across the individual COMIDs that make up each NHM segment. Some of the details in previous comments in this thread have changed as a result, but the overall tradeoffs remain the same. I'll try to sum things up here and document some of the options we discussed.

1) FACET data processed to NHM segments

FACET_NHD

> tar_load(p2_confinement_facet)
> tar_load(p1_drb_temp_obs)
> 
> segs_w_na_facet <- p2_confinement_facet %>%
+     filter(is.na(confinement_calc_facet)) 
> 
> segs_w_na_facet_and_obs <- segs_w_na_facet %>%
+     filter(seg_id_nat %in% p1_drb_temp_obs$seg_id_nat)
> dim(segs_w_na_facet_and_obs)
[1] 35  6
> 

map_w_annotation

2) McManamay data processed to NHM segments

compare

        0%        10%        20%        25%        50%        75%       100% 
  0.000000   5.408206  10.419612  15.289094  50.012221 112.519372 915.216494 
> 

width_replace

compare_widths

compare_width_empirical

janetrbarclay commented 1 year ago

@lekoenig Thanks for doing all this investigative work! I don't have super strong opinions on your questions, but here are some thoughts:

Did we decide that we'd keep and try both datasets and see how they each worked?

lekoenig commented 1 year ago

On the McManamay dataset, I think your reasoning that we're already using the empirical width estimation (though that's only for the NHDv2 resolution work, right?) makes sense.

We originally tackled the empirical width estimation so that we would have some value for widths for the NHDv2 resolution work, yes. However, we had discussed modifying the widths we use for the NHM model runs to create more of an apples-to-apples comparison for the downscaling experiments (i.e., aggregate empirical widths instead of using PRMS-SNTemp widths, see #26). If we are using those empirical widths in the model runs for your groundwater paper, I could see the case for replacing the McManamay widths with those. If not, maybe it is more straightforward to set widths < 1 m to 1m, at least to start.

janetrbarclay commented 1 year ago

Oh yeah, I do remember that conversation. I just skimmed the asRunConfig files for the gw paper and those are all using the mean prms-derived width. Given that, maybe we go ahead and use the <1 m = 1 m as you suggested.

lekoenig commented 1 year ago

Given that, maybe we go ahead and use the <1 m = 1 m as you suggested.

Thanks, Janet, that sounds good to me too. The code changes in #51 currently adopt the following decisions that were discussed in this thread:

To try to get a sense for how accurate the imputed FACET values are, I randomly sampled 30% of the NHM segments in the DRB that have non-NA confinement values and estimated what their "imputed" FACET value would be. As expected, there is a decent amount of scatter between the estimated values and the imputed ones (r2 = 0.77, slope = 0.89, intercept = 0.31).

Rplot