Closed lekoenig closed 1 year ago
Looking at the WQP metadata, there are a few columns that may contain information or metadata related to likelihood of tidal influence:
MonitoringLocationTypeName
(e.g. MonitoringLocationTypeName == "Stream: Tidal stream"
)MonitoringLocationDescriptionText
(only available from the "Site" service)ActivityMediaName
and ActivityMediaSubdivisionName
(e.g. ActivityMediaSubdivisionName == "Estuary"
)HydrologicEvent
(e.g. HydrologicEvent == "Tidal action"
)In the original DRB harmonization code, there are no steps that explicitly aim to filter out tidally-influenced sites. However, the code does filter records so that ActivityMediaSubdivisionName
only includes a handful of accepted values ("Surface Water", "Suspended", "Water", "Bottom material") that would omit any "Estuary" samples.
This repository (what we are doing already):
We currently take the following steps to modify the discrete data produced by the DRB harmonization code mentioned above when creating the input dataset used for modeling:
ActivityMediaName
does not equal "Sediment" (ActivityMediaName
equals "Water")HydrologicEvent
is not equal to "Spill" or "Volcanic action"ActivityStartDate
is within the requested date range (10/1/1979 - 12/31/2021)MonitoringLocationTypeName
is not equal to "Stream: Ditch" and does not contain the string "tidal"OrganizationIdentifier
), site id (MonitoringLocationIdentifier
), lat/lon (LongitudeMeasure
and LatitudeMeasure
), and date-time and if there were duplicates, kept one record within the set. We preferentially retained lab records, then records labeled as "Specific conductance"
, then field values, then field values that represent the mean value of a set of samples (param == "Specific conductance, field, mean"
). p2_wqp_SC_filtered
). This repository (what further steps could we take):
Based on the above, I unfortunately think we've used all of the low-hanging fruit/metadata for identifying tidal sites. There's a couple other steps I think we could add:
HydrologicEvent == "Tidal action"
. However, there are only 7 total records that match this criteria and they are not proximal to the estuary mouth where we've seen high model residuals, so I'm not sure how helpful this will be.Thanks for summarizing these steps! I think most of this information should be included in our manuscript text or SI.
Do you know how tidal streams were identified for the WQP? If it's not from NWIS, we could use the NWIS codes / flags for tidal reaches as a way to identify PRMS segments to omit. That would be a more time-consuming but possibly more accurate way to implement your 3rd suggestion. The 3rd suggestion would be my preference and I think straightforward to do using the distance matrix.
Do you know how tidal streams were identified for the WQP?
My understanding is that this information is supplied by the user. I queried all of the DRB HUC8 boundaries for WQP sites with MonitoringLocationTypeName == "Stream: Tidal stream"
(using sampleMedia = c("Water", "water", "Sediment")
as the only other search criteria). This search returned 21 sites, all of which were managed by the USGS New Jersey WSC. So this makes me believe that the site type is entered this way to match the information in NWIS, and that other water quality monitoring organizations don't seem to be using this field to identify tidal streams.
we could use the NWIS codes / flags for tidal reaches as a way to identify PRMS segments to omit. That would be a more time-consuming but possibly more accurate way to implement your 3rd suggestion.
Agreed that using NWIS sites flagged as tidal provides useful information for which segments to omit. The risk is that the 36 NWIS tidal sites in the DRB (that get matched to 16 unique river segments) aren't comprehensive, so we might retain tidally-influenced WQP sites if they fall on a reach that doesn't overlap one of those highlighted in yellow below.
# code to ID which NWIS sites are flagged as tidal
library(dataRetrieval)
library(sf)
source("2_process/src/match_sites_reaches.R")
targets::tar_load(p2_sites_w_segs)
targets::tar_load(p1_reaches_sf)
drb_huc8s <- c("02040101","02040102","02040103","02040104","02040105",
"02040106","02040201","02040202","02040203","02040204",
"02040205","02040206","02040207")
tidal_sites <- list()
for(i in seq_along(drb_huc8s)){
dat <- whatNWISsites(huc = drb_huc8s[i])
dat_sub <- filter(dat, site_tp_cd == "ST-TS")
tidal_sites[[i]] <- dat_sub
}
tidal_sites_df <- bind_rows(tidal_sites) |>
mutate(site_id = paste0("USGS-", site_no)) |>
rename(lon = dec_long_va,
lat = dec_lat_va)
# these NWIS tidal sites do not contain SC data
all(tidal_sites_df$site_id %in% p2_sites_w_segs$site_id)
# match NWIS tidal sites to NHGF/PRMS segments
tidal_sites_w_segs <- get_site_flowlines(reach_sf = p1_reaches_sf,
sites = tidal_sites_df,
sites_crs = 4269,
max_matches = 1)
# plot tidal reaches
tidal_reaches <- filter(p1_reaches_sf, segidnat %in% tidal_sites_w_segs$segidnat)
The 3rd suggestion would be my preference and I think straightforward to do using the distance matrix.
Below is some code I've used to identify reaches that we assume might be tidal due to their direct connectedness to the mainstem Delaware River below the head-of-tide. This schema identifies and flags 61 segments compared to 16 above:
# Assume all reaches that flow *into* one of the mainstem reaches below the
# head-of-tide are also tidally-influenced
mainstem_reaches_tidal <- c("2771_1","2769_1","2768_1","2767_1","2764_1",
"2762_1","2763_1","2759_1","2757_1","2752_1",
"2753_1","2755_1","2772_1","388_1","389_1",
"385_1","386_1","383_1","382_1","377_1","378_1",
"376_1","351_1","344_1","346_1","333_1")
segs_mainstem_tidal <- filter(p1_reaches_sf, subsegid %in% mainstem_reaches_tidal)
fromsegs <- stringr::str_split(segs_mainstem_tidal$fromsegs, ";")
fromsegs <- do.call("c", fromsegs)
fromsegs_vctr <- fromsegs[c(1:length(fromsegs))]
segs_tribs_tidal <- p1_reaches_sf |>
mutate(subsegseg = as.character(subsegseg)) |>
filter(subsegseg %in% fromsegs_vctr | is.na(toseg))
To capture the tributaries that flow directly into the Delaware Bay Estuary I also assume that segments with toseg == NA
are also tidally-influenced. Note that this leads to three "false positives," one in the eastern part of the watershed in NJ and two others that are "split segments" in the northern part of the watershed (not shown). We could omit these false positive reaches from the set of segments that we flag and exclude from the training data, the evaluation metrics, or both.
@jds485 Depending on how this fits into your modeling workflow, is it helpful for me to submit a PR? I think we discussed using this information for model evaluation right now (and not re-training the model), and so I would just add a target that contains all of the subsegid
's that we assume are tidal. If we want to actually filter out data, I think we could do that by modifying an existing target, p2_wqp_SC_filtered
.
Just so I'm understanding, the 1st map shows reaches that have at least 1 NWIS SC site that is marked as tidal. It is not based on the NWIS reach codes in nhdplusTools (which would hopefully look more like the 2nd map).
We could omit these false positive reaches
Yes, they should be omitted.
is it helpful for me to submit a PR
Yes, and it can be "a target that contains all of the subsegid's that we assume are tidal."
Do you know how many observations are within the assumed tidal reaches?
the 1st map shows reaches that have at least 1 NWIS SC site that is marked as tidal. It is not based on the NWIS reach codes in nhdplusTools (which would hopefully look more like the 2nd map).
The first map in my comment above shows reaches that have at least 1 NWIS site labeled with site_tp_cd == "ST-TS"
(not exclusive to just specific conductance sites). I'm not sure I'm following the nhdplusTools question. Does NHDPlus contain an attribute related to tidal extent? If so, I wasn't aware of it.
Do you know how many observations are within the assumed tidal reaches?
Using p2_site_list_nontidal_csv
as an indication of data density, it looks like there are 75 unique sites with 31,686 observation days of data 😮
readr::read_csv(p2_site_list_nontidal_csv, show_col_types = FALSE) |>
filter(subsegid %in% segs_tribs_tidal$subsegid & !subsegid %in% c("357_1", "8_1", "3_1")) |>
pull(count_days_total) |>
sum()
#> [1] 31686
~74% of those observation days come from 4 sites that lie along mainstem_reaches_tidal
and have NWIS continuous data (01464600
, 014670261
, 01467200
, and 01477050
). This suggests to me that mainstem_reaches_tidal
was used to filter discrete samples thought to be tidally-influenced but that a similar filter step was not applied to the continuous NWIS data.
Either way, assuming tidal reaches does leave out a decent chunk of the data. I guess the hope would be that doing so improves the quality of the data used for the target application.
It's also possible this assumption paints too broad a brush. Based on p2_SC_observations
, I'm guessing the model residuals are pretty high for segment 1263_1
, because the mean specific conductance exceeds the 99% quantile (which is 3,701 uS/cm) of mean SC across all reaches in the network. I believe that segment is the Lewes And Rehoboth Canal near the estuary mouth.
p2_SC_observations |>
group_by(subsegid) |>
summarize(mean_SC = mean(mean_value, na.rm = TRUE), n = n()) |>
arrange(desc(mean_SC))
#> A tibble: 311 x 3
#> subsegid mean_SC n
#> <chr> <dbl> <int>
#> 1 1263_1 39367. 10
#> 2 2140_1 7463. 1
#> 3 2758_1 4384. 28
#> 4 2749_1 3874. 15
#> 5 2137_1 2148. 16
#> 6 870_1 1080. 23
#> 7 1257_1 914. 2
#> 8 865_1 831. 1316
#> 9 864_1 797. 137
#> 10 367_1 789. 52
#> ... with 301 more rows
Does NHDPlus contain an attribute related to tidal extent?
Yes, there is a column named Tidal that is binary indicating if the reach is considered Tidal? 0(no) or 1(yes). Based on the documentation, I'm not sure it's that useful. It seems like this indicator was determined from an elevation filter (where min and max elevation are less than 6 m and the mean annual flow is greater than 0). See section "Identifying Tidal Flowlines" here.
it looks like there are 75 unique sites with 31,686 observation days of data
That is a lot of data! Those 4 NWIS sites are almost 15% of the data we currently use. Good discovery that the mainstem filter was not applied to the NWIS data, but it also makes me wonder if our filtering is too broad. The historical upstream extent of the salt front is near Philadelphia, so sites upstream of that are likely not affected much.
I believe that segment is the Lewes And Rehoboth Canal near the estuary mouth.
I think we would be justified dropping data from that segment because canals can have 2-way flow, especially this close to the estuary mouth. I might reach out to Salme and see if COAWST could be used to support reaches to drop based on likely migration of salinity up the tributaries from tidal fluctuation or other reasons.
Does NHDPlus contain an attribute related to tidal extent?
Yes, there is a column named Tidal that is binary indicating if the reach is considered Tidal? 0(no) or 1(yes). Based on the documentation, I'm not sure it's that useful. It seems like this indicator was determined from an elevation filter (where min and max elevation are less than 6 m and the mean annual flow is greater than 0). See section "Identifying Tidal Flowlines" here.
Woah, I was not aware of this attribute, thanks for the tip! If we go with this general approach for flagging tidal reaches, I think using the NHD attribute is an improvement over my assumption that "all tributaries flowing into below-head-of-tide-mainstem-reaches are tidal" and that we should just use that (same caveats as above r.e. lots of data lies along these reaches).
Thanks for doing this quick analysis! We would lose a lot of data for urban areas using this filter. I sent a chat message to Liv and Salme and sent a link to this to get their opinion on it.
We got some feedback from Liv Herdman and Salme Cook (NY WSC) and came up with the following options for estimating which river segments might be receiving some salts from the estuary vs. the watershed (this is what I mean when I say "tidally-influenced" for the sake of this project).
Jared and I discussed these options and are going to start with option 1. @jds485, feel free to edit this comment or add any take-aways you have as well.
Does NHDPlus contain an attribute related to tidal extent?
Yes, there is a column named Tidal that is binary indicating if the reach is considered Tidal? 0(no) or 1(yes). Based on the documentation, I'm not sure it's that useful. It seems like this indicator was determined from an elevation filter (where min and max elevation are less than 6 m and the mean annual flow is greater than 0). See section "Identifying Tidal Flowlines" here.
I think this cutoff is actually 600 m? From the documentation linked above: Ah, nevermind! The units of maxelevsmo
and minelevsmo
are in centimeters, so the cutoff is 6 m.
The tidal screen was defined as follows:
- Select NHDFlowline features where PlusFlowlineVAA->MAXELEVSMO < 600 AND PlusFlowlineVAA->MINELEVSMO < 600 AND EROM_MA0001->Q0001A > 0.
- On each flow path included in step 1, find the most upstream NHDFlowline feature and navigate downstream with divergences.
- Add NHDFlowline features found in the step 2 navigations to the selected set from step
- Set PlusFlowlineVAA.Tidal attribute to 1 for each feature in the selected set.
I'm playing around with option 2 in the comment above. I identified active NWIS sites that met the following criteria:
site_tp_cd == "Estuary"
or site_tp_cd == "ST-TS"
This isn't necessarily comprehensive, but hopefully the population of 57 sites can give us some indication of the range in elevation that is susceptible to tidal influence and that we could use to refine the 6 m criteria used in the NHD "tidal" attribute.
The NWIS site metadata sometimes (but not always) contains information about altitude
gages_tidal
#> [1] "01482100" "01477120" "01474500" "01467200" "01467081" "01465798"
#> [7] "01467024" "01483700" "01484080" "01484084" "01413038" "01412130"
#> [13] "01412150" "01484235" "01484085" "01412350" "01482800" "01483050"
#> [19] "01475001" "01482320" "01480065" "01480120" "01482695" "01482170"
#> [25] "01475553" "01467059" "400659074495102" "400659074495101" "01464040" "400854074432102"
#> [31] "400854074432101" "01476550" "01467194" "394836075242102" "394836075242101" "395055075174602"
#> [37] "395055075174601" "395252075113902" "395252075113901" "400209074593602" "400209074593601" "393721075344702"
#> [43] "393721075344701" "393436075331702" "393436075331701" "393109075330402" "393109075330401" "394044075311902"
#> [49] "394044075311901" "394536075285102" "394536075285101" "01482650" "01411415" "392711075332702"
#> [55] "392711075332701" "01483511" "01484360"
gages_tidal_metadata <- dataRetrieval::readNWISsite(siteNumbers = c(gages_tidal))
# altitude accuracy
unique(gages_tidal_metadata$alt_acy_va)
#> [1] NA 0.01 1.60 0.10 2.00 0.12 1.00 0.04 10.00 5.40
# altitude of gage/land surface
unique(gages_tidal_metadata$alt_va)
#> [1] NA 0.00 8.00 2.68 -4.00 8.12 5.74 55.66 -0.77 5.00
# altitude datum
unique(gages_tidal_metadata$alt_datum_cd)
#> [1] NA "NAVD88" "NGVD29"
# method altitude determined; see https://help.waterdata.usgs.gov/code/alt_meth_cd_query?fmt=html
unique(gages_tidal_metadata$alt_meth_cd)
#> [1] NA "L" "N" "D" "Y" "M"
I'm not sure if I can easily harmonize and compare these given altitudes with the elevations used in the NHDPlus VAA table(?). Given that, I tried instead to find maxelevsmo
and minelevsmo
for the COMIDs that are associated with these gages.
gages_tidal_nldi <- lapply(gages_tidal, function(x){
dataRetrieval::findNLDI(nwis = x)
}) %>%
bind_rows()
gages_tidal_nldi <- gages_tidal_nldi$origin
gages_tidal_flines <- nhdplusTools::get_nhdplus(comid = gages_tidal_nldi$comid[!is.na(gages_tidal_nldi$comid)],
realization = "flowline")
quantile(gages_tidal_flines$maxelevsmo/100,
c(0, 0.25, 0.5, 0.75, 0.9, 0.95, 1))
#> 0% 25% 50% 75% 90% 95% 100%
#> 0.00 0.00 0.00 1.30 3.42 5.48 21.32
Using the COMIDs and taking the 95th percentile of maxelevsmo
we'd end up with a similar cutoff of 5.5 meters. So I don't think this helps us refine the tidal reaches beyond what's included in the NHDPlus tidal
attribute (option 1 above). The last thing I could try is to extract the elevation of each gage using a DEM and see what the range of values is.
Thanks!
the time series for the past week (5/14/23 - 5/19/23) showed fairly obvious sub-daily fluctuations in gage height, and/o
Did you look at the timeseries for each of these sites? Only asking because of the and/or
. I'm wondering if some of them don't have meaningful tidal changes (or any at all)
Did you look at the timeseries for each of these sites?
In the case of the site types, I took their word for it and did not look at the time series (just for this past week, so a limited data point). Otherwise, I did look at the time series and the gage remarks field.
Okay, it sounds like we can safely use the 6 m NHD Tidal cutoff, then. I'll do a model run with those reaches removed for performance metric calculations
The last thing I could try is to extract the elevation of each gage using a DEM and see what the range of values is.
One more note - I wasn't able to post this on Friday because ScienceBase was temporarily unavailable, but I did grab the 3-m DEM's used by Hopkins et al. for their FACET model runs and extracted the elevation of gages_tidal
in my comment above. I calculated the mean of the extracted raster values for a 30-m buffer around each of the tidal gages which returned the following distribution of gage elevations:
quantile(sites_elev$site_elev_m, na.rm = TRUE, c(0,0.25,0.5,0.75,0.85,0.95,1))
#> 0% 25% 50% 75% 85% 95% 100%
#> -1.2991510 -0.9600000 0.5897652 1.1975898 2.1769435 3.3658545 19.3504091
The only elevation > 4 m is this site which has daily fluctuations in gage height, but these are <0.1 ft for the week starting May 16, 2023. So I'd feel comfortable using 4 m as a refined tidal cutoff if we chose to do that. Here's a plot of the "NHD tidal reaches" compared to these "refined tidal reaches:"
It looks like we'd gain a decent number of observation-days by using this refined elevation, but let me know what you think is best for the modeling workflow you have in mind:
targets::tar_load(p2_drb_comids_seg)
targets::tar_load(p1_reaches_sf)
# download flowlines that intersect the PRMS network
flines <- nhdplusTools::get_nhdplus(comid = p2_drb_comids_seg$comid, realization = "flowline")
# filter PRMS segments by two different tidal criteria
comids_tidal_refined <- filter(flines, maxelevsmo < 400, minelevsmo < 400, qe_ma > 0 ) %>%
pull(comid)
comids_tidal_nhd <- flines$comid[flines$tidal == 1]
# identify NHM segments that overlap a "tidal" NHDPlus reach
segs_with_tidal_comids_nhd <- p2_drb_comids_seg |>
filter(comid %in% comids_tidal_nhd) |>
pull(PRMS_segid) |>
unique()
segs_with_tidal_comids_refined <- p2_drb_comids_seg |>
filter(comid %in% comids_tidal_refined) |>
pull(PRMS_segid) |>
unique()
# count number of observation-days
# 1) using NHDPlus definition of tidal reaches
targets::tar_load(p2_site_list_nontidal_csv)
readr::read_csv(p2_site_list_nontidal_csv, show_col_types = FALSE) |>
filter(subsegid %in% segs_with_tidal_comids_nhd) |>
pull(count_days_total) |>
sum()
#> 63383
# 2) using refined tidal reaches
readr::read_csv(p2_site_list_nontidal_csv, show_col_types = FALSE) |>
filter(subsegid %in% segs_with_tidal_comids_refined) |>
pull(count_days_total) |>
sum()
#> 41370
Thanks for doing this! Let's use the refined 4 m elevation
Closed from #242
We currently use a few different approaches to try to restrict our analysis to inland (freshwater) streams in the DRB, including excluding sites along the mainstem Delaware River below the head-of-tide in ~Trenton and only requesting certain NWIS site types. Based on initial analysis of the model residuals it seems like some tidal sites remain in the dataset and are potentially skewing model error.
I want to revisit, how are we accounting for tidal sites in the discrete WQP data? Is there another filtering step we should use when creating the input data?