Closed lekoenig closed 1 year ago
The geomorphometric dataset from FACET is appealing because it very likely provides more accurate values for channel width, floodplain width, and thus, channel confinement. The dataset was derived from 3-m DEMs, whereas river widths in the McManamay and DeRolph dataset are generated from random forest models.
However, there are a couple of potential issues with FACET that might make it more difficult to use here:
Hmm, that's a tough one. I like the idea of using the DEM data, but river-dl needs to have an input value for each segment, so missing data becomes a problem. If we use the FACET data we would need to figure out how to estimate the missing values (not sure if just the mean is reasonable probably not). I suspect we wouldn't have much issue with the low end of catchment size at the NHM resolution, but we definitely would at the upper end. Do we have a dataset that has drainage area for each reach? Seems like something we should have but I'm not sure where.
Janet
Janet Barclay U.S. Geological Survey New England Water Science Center Connecticut Office 101 Pitkin St. East Hartford, CT 06108
Phone (office) 860 291-6763 Fax 860 291-6799 Email @.**@*.**@*.***> https://www.usgs.gov/staff-profiles/janet-barclay
From: Lauren Koenig @.> Sent: Friday, October 21, 2022 11:50 AM To: USGS-R/drb-gw-hw-model-prep @.> Cc: Subscribed @.***> Subject: [EXTERNAL] Re: [USGS-R/drb-gw-hw-model-prep] Compile channel confinement data (Issue #41)
This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.
The geomorphometric dataset from FACET is appealing because it very likely provides more accurate values for channel width, floodplain width, and thus, channel confinement. The dataset was derived from 3-m DEMs, whereas river widths in the McManamay and DeRolph dataset are generated from random forest models.
However, there are a couple of potential issues with FACET that might make it more difficult to use here:
— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSGS-R%2Fdrb-gw-hw-model-prep%2Fissues%2F41%23issuecomment-1287147712&data=05%7C01%7Cjbarclay%40usgs.gov%7C0ba9e2669a354c5ef94f08dab37c07ef%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019642562407705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2q7%2F3PzqWkGrcXH5wsSNTtsxy94xHHP1W1ODsiAeItE%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA5H7UA3RI6JFLLZBL6KHYDWEK3VBANCNFSM6AAAAAAQONUQJU&data=05%7C01%7Cjbarclay%40usgs.gov%7C0ba9e2669a354c5ef94f08dab37c07ef%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019642562407705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8v38kz%2FzZ6RXkDrgOkm76EaHbcV5mCbb8lpGXCFerBc%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Here's the distribution of watershed area's for the NHM segments (taking the totdasqkm
column from the NHD value-added attributes for those COMIDs located at the bottom of each NHM segment):
> tar_load(p1_drb_comids_down)
> tar_load(p1_nhd_reaches)
>
> p1_nhd_reaches %>%
+ filter(comid %in% p1_drb_comids_down$COMID) %>%
+ pull(totdasqkm) %>%
+ quantile()
0% 25% 50% 75% 100%
0.4950 109.2933 228.0258 766.3649 30744.3168
>
It looks like ~15% of the flowlines would be outside of the watershed area range given in the FACET metadata, so yeah we would need to impute channel confinement values for those segments:
> p1_nhd_reaches %>%
+ filter(comid %in% p1_drb_comids_down$COMID) %>%
+ filter(totdasqkm <3 | totdasqkm > 3000) %>%
+ sf::st_drop_geometry() %>%
+ summarize(proportion_out_of_bounds = nrow(.)/nrow(p1_drb_comids_down))
# A tibble: 1 x 1
proportion_out_of_bounds
<dbl>
1 0.155
>
Is there any correlation with channel confinement and watershed size? I might imagine in the DRB that small catchments tend to be more confined and large ones less so.
Janet Barclay U.S. Geological Survey New England Water Science Center Connecticut Office 101 Pitkin St. East Hartford, CT 06108
Phone (office) 860 291-6763 Fax 860 291-6799 Email @.**@*.**@*.***> https://www.usgs.gov/staff-profiles/janet-barclay
From: Lauren Koenig @.> Sent: Friday, October 21, 2022 12:24 PM To: USGS-R/drb-gw-hw-model-prep @.> Cc: Barclay, Janet R @.>; Comment @.> Subject: [EXTERNAL] Re: [USGS-R/drb-gw-hw-model-prep] Compile channel confinement data (Issue #41)
This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.
Here's the distribution of watershed area's for the NHM segments (taking the totdasqkm column from the NHD value-added attributes for those COMIDs located at the bottom of each NHM segment):
tar_load(p1_drb_comids_down) tar_load(p1_nhd_reaches)
p1_nhd_reaches %>%
- filter(comid %in% p1_drb_comids_down$COMID) %>%
- pull(totdasqkm) %>%
- quantile() 0% 25% 50% 75% 100% 0.4950 109.2933 228.0258 766.3649 30744.3168
It looks like ~15% of the flowlines would be outside of the watershed area range given in the FACET metadata, so yeah we would need to impute channel confinement values for those segments:
p1_nhd_reaches %>%
- filter(comid %in% p1_drb_comids_down$COMID) %>%
- filter(totdasqkm <3 | totdasqkm > 3000) %>%
- sf::st_drop_geometry() %>%
- summarize(proportion_out_of_bounds = nrow(.)/nrow(p1_drb_comids_down))
A tibble: 1 x 1
proportion_out_of_bounds
1 0.155
— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSGS-R%2Fdrb-gw-hw-model-prep%2Fissues%2F41%23issuecomment-1287182514&data=05%7C01%7Cjbarclay%40usgs.gov%7C9f1452cf5f79445ba7ac08dab380b874%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019662676021091%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=F2RTZsKiN0XCwoMqOlSp9FrdQdU5D1LH2aBlYmfZmn0%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA5H7UENWX72NK6GTNBML7LWEK7TNANCNFSM6AAAAAAQONUQJU&data=05%7C01%7Cjbarclay%40usgs.gov%7C9f1452cf5f79445ba7ac08dab380b874%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019662676021091%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SufNSMQud0M1QlL0b24umOwPIlb%2FbFSyu8RrlFyIFkw%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>
Is there any correlation with channel confinement and watershed size?
That expectation does seem reasonable, although it seems like there's enough variability in the DRB that might make it difficult to impute confinement based on watershed size alone(?):
Is the lower plot by stream order? From the scatter plot it seems there isn't much variability within the big watersheds, right? Seems like we might be able to use a mean value across big rivers for the big ones? Not sure what to do with the small ones. How many do we have below the 3 km2 threshold?
Oops, yes - the lower plot is separated by stream order. There are 2 NHM segments below the 3 km2 threshold, so perhaps those could also be imputed with some mean value.
> p1_nhd_reaches %>%
+ filter(comid %in% p1_drb_comids_down$COMID) %>%
+ filter(totdasqkm <3) %>%
+ sf::st_drop_geometry() %>%
+ pull(totdasqkm)
[1] 2.6199 0.4950
>
Could use some mean value or since it's only 2 could pull the nearest downstream reach or something like that (assuming some spatial correlation in the values)
Janet Barclay U.S. Geological Survey New England Water Science Center Connecticut Office 101 Pitkin St. East Hartford, CT 06108
Phone (office) 860 291-6763 Fax 860 291-6799 Email @.**@*.**@*.***> https://www.usgs.gov/staff-profiles/janet-barclay
From: Lauren Koenig @.> Sent: Friday, October 21, 2022 1:23 PM To: USGS-R/drb-gw-hw-model-prep @.> Cc: Barclay, Janet R @.>; Comment @.> Subject: [EXTERNAL] Re: [USGS-R/drb-gw-hw-model-prep] Compile channel confinement data (Issue #41)
This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.
Oops, yes - the lower plot is separated by stream order. There are 2 NHM segments below the 3 km2 threshold, so perhaps those could also be imputed with some mean value.
p1_nhd_reaches %>%
- filter(comid %in% p1_drb_comids_down$COMID) %>%
- filter(totdasqkm <3) %>%
- sf::st_drop_geometry() %>%
- pull(totdasqkm) [1] 2.6199 0.4950
— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSGS-R%2Fdrb-gw-hw-model-prep%2Fissues%2F41%23issuecomment-1287241966&data=05%7C01%7Cjbarclay%40usgs.gov%7Cdea854d255814670fcfa08dab388f1e5%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019698001211111%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nL%2FtWsGX2s8WDlpDrSxeGCgWLBnpTUQWr%2BnYq9Inq8s%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA5H7UGXGVX6OHXGOPDXJHLWELGP3ANCNFSM6AAAAAAQONUQJU&data=05%7C01%7Cjbarclay%40usgs.gov%7Cdea854d255814670fcfa08dab388f1e5%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019698001211111%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GrrY%2FKI28ML91xbEbFrbVX2Oiwso%2BAp3DhqMHbVarJU%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>
The McManamay and DeRolph dataset includes Confinement
as a categorial value, but we can also calculate floodplain width/channel width using the other columns in their dataset. Here's a comparison of the calculated confinement values between the two datasets (without any additional spatial aggregation on our part). This isn't a one-to-one comparison, but rather, just showing the range of values within those reaches assigned to a given stream order:
The McManamay dataset includes several high outliers that are driven by very small values for width (which were estimated using a RF algorithm at the national scale). Those high values are not likely to show up in the NHM-aggregated data. Below are the same McManamay estimates once I've aggregated the values from NHDPlusv2 to NHM:
If I can join the FACET values to the NHM flowlines, I could directly compare these values.
In terms of aggregating the FACET values to the NHM flowlines, one idea would just be to replicate the workflow from a recent paper from this group:
Because FACET generates a higher resolution stream network than NHDPlusV2, the FACET stream segment with the largest Shreve magnitude within each NHDPlusV2 catchment was selected and then statistically summarized for each geomorphometric metric as the mean of the 5th to 95th percentile values (to exclude anomalous outliers) for the following attributes: streambank height, stream channel width, average streambank angle, and stream slope and sinuosity; total active floodplain width was summarized as the mean in each reach.
So instead of creating a crosswalk between the segment flowlines, just do a spatial join with the FACET flowlines and the catchments that comprise the NHM catchments, and summarize floodplain width and channel width as above.
From that first figure it would seem the two datasets provide pretty similar results (at least similar ranges) for 4th order and above, but diverge a good bit below that. Any thoughts on why the two datasets diverge in the small streams?
To make sure I have this straight, McManamay is national in scope and uses a RF for the stream reach attributes whereas FACET is specific to the DRB, uses a DEM for channel width and a regression for floodplain width?
I think it's interesting that McManamay has more variation overall than FACET. I think I might have expected the opposite (that the national dataset and the RF would smooth out some of the outliers).
the FACET stream segment with the largest Shreve magnitude
what's a Shreve magnitude?
To make sure I have this straight, McManamay is national in scope and uses a RF for the stream reach attributes whereas FACET is specific to the DRB, uses a DEM for channel width and a regression for floodplain width?
Thanks, I think you've got it but I want to record a few details here for future reference. (And Shreve magnitude is just another stream numbering scheme, alternative to Strahler order.)
McManamay
FACET
So instead of creating a crosswalk between the segment flowlines, just do a spatial join with the FACET flowlines and the catchments that comprise the NHM catchments, and summarize floodplain width and channel width as above.
For the FACET dataset, I followed the procedure outlined in Noe et al. 2022 (linked above) to aggregate the geomorphic metrics from the high-resolution FACET network to the NHDPlusv2 network. Briefly, this consisted of doing a spatial join between the FACET network and NHDPlusv2 catchment polygons, and then identifying which FACET segment was most downstream within that catchment. We assume the values from the ID'd FACET segment also apply to the paired NHDPlusv2 flowline for that catchment. Here's a comparison of estimated confinement at the NHD-scale for both FACET and the McManamay and DeRolph datasets:
The overall strength of the correlation is fairly poor as there's a lot of scatter around the values we would estimate from the McManamay channel/floodplain widths. However, the central tendency indicates some agreement between the two data sources, so that is encouraging.
I've also taken a stab at aggregating the FACET-derived values to the NHM-scale by calculating a length-weighted average channel width and floodplain width, respectively, based on the COMIDs that comprise each NHM segment. Using a length-weighted mean results in 75 NA values and 381 confinement estimates across segments within the DRB network. So we would still have to impute those 75 segments somehow. I also want to point out that for the segments where we can come up with a FACET-derived confinement estimate, the length of the NHM segment that overlaps COMIDs with FACET data is variable. I currently flag segments where <70% of the length overlaps COMIDs with FACET data.
Thanks, @lekoenig, for doing all this. I look forward to chatting details and next steps on Tuesday.
Janet
Since our conversation on Tuesday (11/1/2022) I've made a few edits to how I was processing both the McManamay and FACET datasets. Namely, for each dataset I now compile the input values and calculate channel confinement (~floodplain width/river width) at the NHDPlusv2-scale. Then, to aggregate to the NHM-scale, I calculate a length-weighted mean of the confinement values across the individual COMIDs that make up each NHM segment. Some of the details in previous comments in this thread have changed as a result, but the overall tradeoffs remain the same. I'll try to sum things up here and document some of the options we discussed.
1) FACET data processed to NHM segments
> tar_load(p2_confinement_facet)
> tar_load(p1_drb_temp_obs)
>
> segs_w_na_facet <- p2_confinement_facet %>%
+ filter(is.na(confinement_calc_facet))
>
> segs_w_na_facet_and_obs <- segs_w_na_facet %>%
+ filter(seg_id_nat %in% p1_drb_temp_obs$seg_id_nat)
> dim(segs_w_na_facet_and_obs)
[1] 35 6
>
2) McManamay data processed to NHM segments
0% 10% 20% 25% 50% 75% 100%
0.000000 5.408206 10.419612 15.289094 50.012221 112.519372 915.216494
>
p2_nhd_mainstem_reaches_w_width
using an empirical regression approach with NWIS values). Again, using the FACET widths as some sort of benchmark, here is how the widths compare (FACET vs McManamay widths: r2 = 0.11, p < 0.001, slope = 1.06; FACET vs empirical widths: r2 = 0.19, p < 0.001, slope = 0.55):@lekoenig Thanks for doing all this investigative work! I don't have super strong opinions on your questions, but here are some thoughts:
Did we decide that we'd keep and try both datasets and see how they each worked?
On the McManamay dataset, I think your reasoning that we're already using the empirical width estimation (though that's only for the NHDv2 resolution work, right?) makes sense.
We originally tackled the empirical width estimation so that we would have some value for widths for the NHDv2 resolution work, yes. However, we had discussed modifying the widths we use for the NHM model runs to create more of an apples-to-apples comparison for the downscaling experiments (i.e., aggregate empirical widths instead of using PRMS-SNTemp widths, see #26). If we are using those empirical widths in the model runs for your groundwater paper, I could see the case for replacing the McManamay widths with those. If not, maybe it is more straightforward to set widths < 1 m to 1m, at least to start.
Oh yeah, I do remember that conversation. I just skimmed the asRunConfig files for the gw paper and those are all using the mean prms-derived width. Given that, maybe we go ahead and use the <1 m = 1 m as you suggested.
Given that, maybe we go ahead and use the <1 m = 1 m as you suggested.
Thanks, Janet, that sounds good to me too. The code changes in #51 currently adopt the following decisions that were discussed in this thread:
To try to get a sense for how accurate the imputed FACET values are, I randomly sampled 30% of the NHM segments in the DRB that have non-NA confinement values and estimated what their "imputed" FACET value would be. As expected, there is a decent amount of scatter between the estimated values and the imputed ones (r2 = 0.77, slope = 0.89, intercept = 0.31).
Options include the FACET dataset (Hopkins et al. 2020) which is DRB-specific, and the McManamay and DeRolph national dataset which is indexed by COMID.