Compile channel confinement data

lekoenig commented 2 years ago

Options include the FACET dataset (Hopkins et al. 2020) which is DRB-specific, and the McManamay and DeRolph national dataset which is indexed by COMID.

lekoenig commented 1 year ago

The geomorphometric dataset from FACET is appealing because it very likely provides more accurate values for channel width, floodplain width, and thus, channel confinement. The dataset was derived from 3-m DEMs, whereas river widths in the McManamay and DeRolph dataset are generated from random forest models.

However, there are a couple of potential issues with FACET that might make it more difficult to use here:

the FACET dataset for the DRB contains its own network, which would need to be crosswalked to NHDPlusV2 reaches (and then NHM segments)
From the metadata, "floodplain metrics were only calculated for streams with drainage area greater than 3 km2 and less than 3000 km2. This bound for watershed area was selected to coincide with the range of watershed areas used to calibrate the regression model FACET uses to delineate the floodplain. Therefore, FACET metric tables will have NAs for floodplain metrics for small headwater streams and large rivers." So at least at the NHD-scale, there would be many reaches without floodplain width, which is needed for computing channel confinement. To give you an idea, for one ~huc10 in their dataset ("02040104_u," which includes the Neversink), there are 2,485 flowline segments but 1,563 (63%) of those are missing floodplain widths. Maybe this isn't as big of an issue at the NHM-scale, however. If I take a very coarse attempt at this for the NHM reaches, we might be missing floodplain information for ~10-15% of segments (this is hard to evaluate well without a crosswalk between the FACET network and the NHM network).

janetrbarclay commented 1 year ago

Hmm, that's a tough one. I like the idea of using the DEM data, but river-dl needs to have an input value for each segment, so missing data becomes a problem. If we use the FACET data we would need to figure out how to estimate the missing values (not sure if just the mean is reasonable probably not). I suspect we wouldn't have much issue with the low end of catchment size at the NHM resolution, but we definitely would at the upper end. Do we have a dataset that has drainage area for each reach? Seems like something we should have but I'm not sure where.

Janet

Janet Barclay U.S. Geological Survey New England Water Science Center Connecticut Office 101 Pitkin St. East Hartford, CT 06108

Phone (office) 860 291-6763 Fax 860 291-6799 Email @.**@*.**@*.***> https://www.usgs.gov/staff-profiles/janet-barclay

From: Lauren Koenig @.> Sent: Friday, October 21, 2022 11:50 AM To: USGS-R/drb-gw-hw-model-prep @.> Cc: Subscribed @.***> Subject: [EXTERNAL] Re: [USGS-R/drb-gw-hw-model-prep] Compile channel confinement data (Issue #41)

This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.

The geomorphometric dataset from FACET is appealing because it very likely provides more accurate values for channel width, floodplain width, and thus, channel confinement. The dataset was derived from 3-m DEMs, whereas river widths in the McManamay and DeRolph dataset are generated from random forest models.

However, there are a couple of potential issues with FACET that might make it more difficult to use here:

the FACET dataset for the DRB contains its own network, which would need to be crosswalked to NHDPlusV2 reaches (and then NHM segments)
From the metadatahttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.sciencebase.gov%2Fcatalog%2Ffile%2Fget%2F5e4d6d68e4b0ff554f6db504%3Ff%3D__disk__31%252Ff2%252F9e%252F31f29e6bc147724029d9044dd5ed0f7918bc316c%26transform%3D1%26allowOpen%3Dtrue&data=05%7C01%7Cjbarclay%40usgs.gov%7C0ba9e2669a354c5ef94f08dab37c07ef%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019642562407705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mkC8w1hWyj3%2BkP3%2B1gZjGkuiAp%2FD5%2BIvL1Pl0IEu%2Blo%3D&reserved=0, "floodplain metrics were only calculated for streams with drainage area greater than 3 km2 and less than 3000 km2. This bound for watershed area was selected to coincide with the range of watershed areas used to calibrate the regression model FACET uses to delineate the floodplain. Therefore, FACET metric tables will have NAs for floodplain metrics for small headwater streams and large rivers." So at least at the NHD-scale, there would be many reaches without floodplain width, which is needed for computing channel confinement. To give you an idea, for one ~huc10 in their dataset ("02040104_u," which includes the Neversink), there are 2,485 flowline segments but 1,563 (63%) of those are missing floodplain widths. Maybe this isn't as big of an issue at the NHM-scale, however. If I take a very coarse attempt at this for the NHM reaches, we might be missing floodplain information for ~10-15% of segments (this is hard to evaluate well without a crosswalk between the FACET network and the NHM network).

— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSGS-R%2Fdrb-gw-hw-model-prep%2Fissues%2F41%23issuecomment-1287147712&data=05%7C01%7Cjbarclay%40usgs.gov%7C0ba9e2669a354c5ef94f08dab37c07ef%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019642562407705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2q7%2F3PzqWkGrcXH5wsSNTtsxy94xHHP1W1ODsiAeItE%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA5H7UA3RI6JFLLZBL6KHYDWEK3VBANCNFSM6AAAAAAQONUQJU&data=05%7C01%7Cjbarclay%40usgs.gov%7C0ba9e2669a354c5ef94f08dab37c07ef%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019642562407705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8v38kz%2FzZ6RXkDrgOkm76EaHbcV5mCbb8lpGXCFerBc%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

lekoenig commented 1 year ago

Here's the distribution of watershed area's for the NHM segments (taking the totdasqkm column from the NHD value-added attributes for those COMIDs located at the bottom of each NHM segment):

> tar_load(p1_drb_comids_down)
> tar_load(p1_nhd_reaches)
> 
> p1_nhd_reaches %>%
+     filter(comid %in% p1_drb_comids_down$COMID) %>%
+     pull(totdasqkm) %>%
+     quantile()
        0%        25%        50%        75%       100% 
    0.4950   109.2933   228.0258   766.3649 30744.3168 
>

It looks like ~15% of the flowlines would be outside of the watershed area range given in the FACET metadata, so yeah we would need to impute channel confinement values for those segments:

> p1_nhd_reaches %>%
+   filter(comid %in% p1_drb_comids_down$COMID) %>%
+   filter(totdasqkm <3 | totdasqkm > 3000) %>%
+   sf::st_drop_geometry() %>%
+   summarize(proportion_out_of_bounds = nrow(.)/nrow(p1_drb_comids_down))
# A tibble: 1 x 1
  proportion_out_of_bounds
                     <dbl>
1                    0.155
>

janetrbarclay commented 1 year ago

Is there any correlation with channel confinement and watershed size? I might imagine in the DRB that small catchments tend to be more confined and large ones less so.

Janet Barclay U.S. Geological Survey New England Water Science Center Connecticut Office 101 Pitkin St. East Hartford, CT 06108

Phone (office) 860 291-6763 Fax 860 291-6799 Email @.**@*.**@*.***> https://www.usgs.gov/staff-profiles/janet-barclay

From: Lauren Koenig @.> Sent: Friday, October 21, 2022 12:24 PM To: USGS-R/drb-gw-hw-model-prep @.> Cc: Barclay, Janet R @.>; Comment @.> Subject: [EXTERNAL] Re: [USGS-R/drb-gw-hw-model-prep] Compile channel confinement data (Issue #41)

This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.

Here's the distribution of watershed area's for the NHM segments (taking the totdasqkm column from the NHD value-added attributes for those COMIDs located at the bottom of each NHM segment):

tar_load(p1_drb_comids_down) tar_load(p1_nhd_reaches)

p1_nhd_reaches %>%

filter(comid %in% p1_drb_comids_down$COMID) %>%

pull(totdasqkm) %>%

quantile() 0% 25% 50% 75% 100% 0.4950 109.2933 228.0258 766.3649 30744.3168

It looks like ~15% of the flowlines would be outside of the watershed area range given in the FACET metadata, so yeah we would need to impute channel confinement values for those segments:

p1_nhd_reaches %>%

filter(comid %in% p1_drb_comids_down$COMID) %>%

filter(totdasqkm <3 | totdasqkm > 3000) %>%

sf::st_drop_geometry() %>%

summarize(proportion_out_of_bounds = nrow(.)/nrow(p1_drb_comids_down))
A tibble: 1 x 1

proportion_out_of_bounds
1 0.155

— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSGS-R%2Fdrb-gw-hw-model-prep%2Fissues%2F41%23issuecomment-1287182514&data=05%7C01%7Cjbarclay%40usgs.gov%7C9f1452cf5f79445ba7ac08dab380b874%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019662676021091%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=F2RTZsKiN0XCwoMqOlSp9FrdQdU5D1LH2aBlYmfZmn0%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA5H7UENWX72NK6GTNBML7LWEK7TNANCNFSM6AAAAAAQONUQJU&data=05%7C01%7Cjbarclay%40usgs.gov%7C9f1452cf5f79445ba7ac08dab380b874%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019662676021091%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SufNSMQud0M1QlL0b24umOwPIlb%2FbFSyu8RrlFyIFkw%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

lekoenig commented 1 year ago

Is there any correlation with channel confinement and watershed size?

That expectation does seem reasonable, although it seems like there's enough variability in the DRB that might make it difficult to impute confinement based on watershed size alone(?):

plot1

plot2

janetrbarclay commented 1 year ago

Is the lower plot by stream order? From the scatter plot it seems there isn't much variability within the big watersheds, right? Seems like we might be able to use a mean value across big rivers for the big ones? Not sure what to do with the small ones. How many do we have below the 3 km2 threshold?

lekoenig commented 1 year ago

Oops, yes - the lower plot is separated by stream order. There are 2 NHM segments below the 3 km2 threshold, so perhaps those could also be imputed with some mean value.

> p1_nhd_reaches %>%
+   filter(comid %in% p1_drb_comids_down$COMID) %>%
+   filter(totdasqkm <3) %>%
+   sf::st_drop_geometry() %>% 
+   pull(totdasqkm)
[1] 2.6199 0.4950
>

janetrbarclay commented 1 year ago

Could use some mean value or since it's only 2 could pull the nearest downstream reach or something like that (assuming some spatial correlation in the values)

Janet Barclay U.S. Geological Survey New England Water Science Center Connecticut Office 101 Pitkin St. East Hartford, CT 06108

Phone (office) 860 291-6763 Fax 860 291-6799 Email @.**@*.**@*.***> https://www.usgs.gov/staff-profiles/janet-barclay

From: Lauren Koenig @.> Sent: Friday, October 21, 2022 1:23 PM To: USGS-R/drb-gw-hw-model-prep @.> Cc: Barclay, Janet R @.>; Comment @.> Subject: [EXTERNAL] Re: [USGS-R/drb-gw-hw-model-prep] Compile channel confinement data (Issue #41)

This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.

Oops, yes - the lower plot is separated by stream order. There are 2 NHM segments below the 3 km2 threshold, so perhaps those could also be imputed with some mean value.

p1_nhd_reaches %>%

filter(comid %in% p1_drb_comids_down$COMID) %>%

filter(totdasqkm <3) %>%

sf::st_drop_geometry() %>%

pull(totdasqkm) [1] 2.6199 0.4950

— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSGS-R%2Fdrb-gw-hw-model-prep%2Fissues%2F41%23issuecomment-1287241966&data=05%7C01%7Cjbarclay%40usgs.gov%7Cdea854d255814670fcfa08dab388f1e5%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019698001211111%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nL%2FtWsGX2s8WDlpDrSxeGCgWLBnpTUQWr%2BnYq9Inq8s%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA5H7UGXGVX6OHXGOPDXJHLWELGP3ANCNFSM6AAAAAAQONUQJU&data=05%7C01%7Cjbarclay%40usgs.gov%7Cdea854d255814670fcfa08dab388f1e5%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638019698001211111%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GrrY%2FKI28ML91xbEbFrbVX2Oiwso%2BAp3DhqMHbVarJU%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

lekoenig commented 1 year ago

The McManamay and DeRolph dataset includes Confinement as a categorial value, but we can also calculate floodplain width/channel width using the other columns in their dataset. Here's a comparison of the calculated confinement values between the two datasets (without any additional spatial aggregation on our part). This isn't a one-to-one comparison, but rather, just showing the range of values within those reaches assigned to a given stream order:

comparison_plot

The McManamay dataset includes several high outliers that are driven by very small values for width (which were estimated using a RF algorithm at the national scale). Those high values are not likely to show up in the NHM-aggregated data. Below are the same McManamay estimates once I've aggregated the values from NHDPlusv2 to NHM:

mcmanamay_boxplot

If I can join the FACET values to the NHM flowlines, I could directly compare these values.

lekoenig commented 1 year ago

In terms of aggregating the FACET values to the NHM flowlines, one idea would just be to replicate the workflow from a recent paper from this group:

Because FACET generates a higher resolution stream network than NHDPlusV2, the FACET stream segment with the largest Shreve magnitude within each NHDPlusV2 catchment was selected and then statistically summarized for each geomorphometric metric as the mean of the 5th to 95th percentile values (to exclude anomalous outliers) for the following attributes: streambank height, stream channel width, average streambank angle, and stream slope and sinuosity; total active floodplain width was summarized as the mean in each reach.

So instead of creating a crosswalk between the segment flowlines, just do a spatial join with the FACET flowlines and the catchments that comprise the NHM catchments, and summarize floodplain width and channel width as above.

janetrbarclay commented 1 year ago

From that first figure it would seem the two datasets provide pretty similar results (at least similar ranges) for 4th order and above, but diverge a good bit below that. Any thoughts on why the two datasets diverge in the small streams?

To make sure I have this straight, McManamay is national in scope and uses a RF for the stream reach attributes whereas FACET is specific to the DRB, uses a DEM for channel width and a regression for floodplain width?

I think it's interesting that McManamay has more variation overall than FACET. I think I might have expected the opposite (that the national dataset and the RF would smooth out some of the outliers).

janetrbarclay commented 1 year ago

the FACET stream segment with the largest Shreve magnitude

what's a Shreve magnitude?

lekoenig commented 1 year ago

To make sure I have this straight, McManamay is national in scope and uses a RF for the stream reach attributes whereas FACET is specific to the DRB, uses a DEM for channel width and a regression for floodplain width?

Thanks, I think you've got it but I want to record a few details here for future reference. (And Shreve magnitude is just another stream numbering scheme, alternative to Strahler order.)

McManamay

national in scope
channel widths were estimated for all NHDPlusv2 flowline reaches using a RF algorithm
valley bottom polygons were delineated for all NHDPlusv2 flowline reaches using the Valley Confinement Algorithm (VCA) tool in ArcMap. The accompanying paper includes further description about assumptions made about bank height, etc. under the header "Valley Confinement."

FACET

the FACET model was applied in the Delaware River and Chesapeake Bay watersheds to estimate reach geomorphometry
the floodplain delineated by FACET is based on relationships between drainage area and height above nearest drainage (HAND) thresholds.
channel width is the measured distance between two bank points (the stream banks are ID'd using LiDAR DEMs. I don't have much information beyond that, but we could ask Krissy Hopkins for more information about the different metrics that are available.

lekoenig commented 1 year ago

So instead of creating a crosswalk between the segment flowlines, just do a spatial join with the FACET flowlines and the catchments that comprise the NHM catchments, and summarize floodplain width and channel width as above.

For the FACET dataset, I followed the procedure outlined in Noe et al. 2022 (linked above) to aggregate the geomorphic metrics from the high-resolution FACET network to the NHDPlusv2 network. Briefly, this consisted of doing a spatial join between the FACET network and NHDPlusv2 catchment polygons, and then identifying which FACET segment was most downstream within that catchment. We assume the values from the ID'd FACET segment also apply to the paired NHDPlusv2 flowline for that catchment. Here's a comparison of estimated confinement at the NHD-scale for both FACET and the McManamay and DeRolph datasets:

compare_confinement_nhd

The overall strength of the correlation is fairly poor as there's a lot of scatter around the values we would estimate from the McManamay channel/floodplain widths. However, the central tendency indicates some agreement between the two data sources, so that is encouraging.

I've also taken a stab at aggregating the FACET-derived values to the NHM-scale by calculating a length-weighted average channel width and floodplain width, respectively, based on the COMIDs that comprise each NHM segment. Using a length-weighted mean results in 75 NA values and 381 confinement estimates across segments within the DRB network. So we would still have to impute those 75 segments somehow. I also want to point out that for the segments where we can come up with a FACET-derived confinement estimate, the length of the NHM segment that overlaps COMIDs with FACET data is variable. I currently flag segments where <70% of the length overlaps COMIDs with FACET data.

FACET_hist

janetrbarclay commented 1 year ago

Thanks, @lekoenig, for doing all this. I look forward to chatting details and next steps on Tuesday.

Janet

lekoenig commented 1 year ago

Since our conversation on Tuesday (11/1/2022) I've made a few edits to how I was processing both the McManamay and FACET datasets. Namely, for each dataset I now compile the input values and calculate channel confinement (~floodplain width/river width) at the NHDPlusv2-scale. Then, to aggregate to the NHM-scale, I calculate a length-weighted mean of the confinement values across the individual COMIDs that make up each NHM segment. Some of the details in previous comments in this thread have changed as a result, but the overall tradeoffs remain the same. I'll try to sum things up here and document some of the options we discussed.

1) FACET data processed to NHM segments

First, the FACET network is higher resolution than the NHDPlusv2 network, so the image below shows how I've assigned FACET values to the NHDPlusv2 flowlines. Mirroring the approach described in Noe et al. (2022), I did a spatial join to find which FACET segments had their centroid within an NHDPlusv2 catchment. Then, among the FACET segments that intersect each NHD catchment (orange), I selected the FACET segment with the greatest upstream drainage area and assumed the values for channel width and floodplain width for that segment (red) apply to the NHDPlusv2 flowline that corresponds with that NHD catchment.

FACET_NHD

Aggregating the FACET-derived NHDv2 values to the NHM segments, we have 93 segments with NA for channel confinement. Most of these segments (shown in red below) represent large-river segments that are larger than the stated bounds of the FACET dataset (i.e., upstream area > 3000 km2). However, there are also ~mid-sized reaches that are missing FACET values, like the segment that I've highlighted with a yellow box below (segidnat = "2335"). For seg_id_nat 2335, there are 6 COMIDs that make up that segment, 4 of which have a matching FACET segment. For those 4 segments, either channel width, floodplain width, or both is NA and so we're not able to calculate a confinement value for any of the contributing NHDPlusv2 reaches. The lead author of the data release, Krissy Hopkins, has told me that sometimes the FACET model was not able to detect stream banks or floodplains and I suspect that's what's going on here.
Beyond the NHM segments where FACET values are missing altogether, we are missing FACET metrics for some contributing COMIDs and so the aggregated NHM values should be interpreted with some caution. I currently flag NHM segments where <70% of the NHM segment length overlaps COMIDs with FACET data (see histogram in comment above this one). Combined with the NA segments, this represents ~47% of all of the NHM segments in the DRB. So missing values is our biggest challenge with the FACET data.
For our purposes, the NA values are mostly an issue if those segments also have temperature observations. Of the 93 segments with NA for FACET channel confinement, 35 of those also have temperature observations:

> tar_load(p2_confinement_facet)
> tar_load(p1_drb_temp_obs)
> 
> segs_w_na_facet <- p2_confinement_facet %>%
+     filter(is.na(confinement_calc_facet)) 
> 
> segs_w_na_facet_and_obs <- segs_w_na_facet %>%
+     filter(seg_id_nat %in% p1_drb_temp_obs$seg_id_nat)
> dim(segs_w_na_facet_and_obs)
[1] 35  6
>

Assuming these FACET values are still of interest, my suggestion for imputing the missing values would be to use the confinement value from the nearest segment in the upstream or downstream direction. I haven't attempted this yet, and we could consider whether it makes more sense to do that imputation at the NHDPlusv2-scale or the NHM-scale. Do you have any thoughts on that? Should I try this imputation step?

map_w_annotation

2) McManamay data processed to NHM segments

The McManamay dataset has the advantages of being available at a national extent and already referenced to NHDPlusv2. The "valley bottom area" is generated using an ArcGIS program called the Valley Confinement Algorithm (VCA) and river widths are generated from a random forest algorithm that uses reach attributes to predict width. The McManamay and DeRolph dataset presents confinement as a categorical variable (i.e., "confined," moderately confined," "unconfined"), but I'm using the geomorphic metrics in their dataset to calculate a continuous variable representing channel confinement. If we consider FACET the reference or benchmark for confinement values, it's worth looking at how the McManamay values compare. At our preferred NHM-scale, there is a lot of scatter in the relationship between FACET-derived confinement values and McManamay-derived confinement values (r2 = 0.0, p = 0.57).

compare

I've worked with this dataset once before and one thing I noticed back then is that some estimates for width are very small (maybe even implausibly so), so some of the noise - and the very high values of confinement - in the scatterplot above might be due to very small widths. I looked into this a bit, and to give you some idea of the distribution of McManamay widths in the DRB I've pasted the quantiles below, focusing on the low-end of the distribution.

        0%        10%        20%        25%        50%        75%       100% 
  0.000000   5.408206  10.419612  15.289094  50.012221 112.519372 915.216494 
>

To deal with extreme width values I see two options. The first is to say we don't believe any width values below some value (e.g. 1 m), and so we replace all widths < 1m with 1. Comparing the scatterplot below with the right-hand panel just above we can see that removing the very small widths does bring down the maximum values of channel confinement that we'd predict from these data (r2 = 0.04, p < 0.001).

width_replace

The second option is to say that we'd rather not use the McManamay width values at all and so I've added an option to my processing function to use own own dataset of widths that are referenced to NHDPlusv2 (either those values derived from FACET, or the widths that we estimate in p2_nhd_mainstem_reaches_w_width using an empirical regression approach with NWIS values). Again, using the FACET widths as some sort of benchmark, here is how the widths compare (FACET vs McManamay widths: r2 = 0.11, p < 0.001, slope = 1.06; FACET vs empirical widths: r2 = 0.19, p < 0.001, slope = 0.55):

compare_widths

I can see one advantage of using the empirical widths being that we use those values for river-dl, so why not be consistent when calculating confinement? If we do that, there's still a lot of scatter between FACET and McManamay so I think we'd just have to accept these differences between the two confinement datasets (r2 = 0.09, p < 0.0001, slope = 5.26). I know this is a lot of information at once. Let me know if you have any thoughts on whether we adopt either of these approaches for modifying the width values used in the McManamay dataset for calculating channel confinement.

compare_width_empirical

janetrbarclay commented 1 year ago

@lekoenig Thanks for doing all this investigative work! I don't have super strong opinions on your questions, but here are some thoughts:

If you have time, I think it's worth imputing the missing values. If you're doing it at the NHDv2 resolution, does that mean you're imputing for more reaches? I wonder if that adds more error versus aggregating to NHM and then imputing only for the ones that are still missing?
On the McManamay dataset, I think your reasoning that we're already using the empirical width estimation (though that's only for the NHDv2 resolution work, right?) makes sense.

Did we decide that we'd keep and try both datasets and see how they each worked?

lekoenig commented 1 year ago

On the McManamay dataset, I think your reasoning that we're already using the empirical width estimation (though that's only for the NHDv2 resolution work, right?) makes sense.

We originally tackled the empirical width estimation so that we would have some value for widths for the NHDv2 resolution work, yes. However, we had discussed modifying the widths we use for the NHM model runs to create more of an apples-to-apples comparison for the downscaling experiments (i.e., aggregate empirical widths instead of using PRMS-SNTemp widths, see #26). If we are using those empirical widths in the model runs for your groundwater paper, I could see the case for replacing the McManamay widths with those. If not, maybe it is more straightforward to set widths < 1 m to 1m, at least to start.

janetrbarclay commented 1 year ago

Oh yeah, I do remember that conversation. I just skimmed the asRunConfig files for the gw paper and those are all using the mean prms-derived width. Given that, maybe we go ahead and use the <1 m = 1 m as you suggested.

lekoenig commented 1 year ago

Given that, maybe we go ahead and use the <1 m = 1 m as you suggested.

Thanks, Janet, that sounds good to me too. The code changes in #51 currently adopt the following decisions that were discussed in this thread:

When calculating channel confinement from the McManamay and DeRolph data, set all width values <1m = 1m to avoid implausibly high confinement estimates that result from dividing by very small widths.
We attempt to impute confinement values for NHM segments that are missing either McManamay confinement or FACET-derived confinement estimates. We currently use a "nearest" non-NA neighbor approach, but this does lead to some NHM segments being filled with FACET values from > 20 km away. In addition, we're still left with 3 NHM segments with no "filled" values because those segments are not hydrologically connected to any segments with FACET data (although neither of these 3 segments have temperature observations).

To try to get a sense for how accurate the imputed FACET values are, I randomly sampled 30% of the NHM segments in the DRB that have non-NA confinement values and estimated what their "imputed" FACET value would be. As expected, there is a decent amount of scatter between the estimated values and the imputed ones (r2 = 0.77, slope = 0.89, intercept = 0.31).

Rplot

USGS-R / drb-gw-hw-model-prep

Compile channel confinement data #41

A tibble: 1 x 1