Discussion: What is the best time scale to use for our models and analysis?

amsnyder commented 2 years ago

@galengorski

These thoughts are based on some conversations with @amsnyder @ted80810 and @salme146, I'd like to make this discussion a place where we can talk about model time scales and what it might take to move from daily to sub-daily time scale (and if it would be worthwhile).

Up to this point we have been working at the daily time step. We're working at this time step for a few reasons:

Daily time step (or even weekly) are the time scales that stakeholders consider and management decisions are made on

It makes multi-year analysis and modeling more computationally manageable

Daily time step is the resolution that inland salinity and other DRB modeling campaigns are using, which might make eventual coupling easier

Meteorological data from GridMET is at the daily time step

Daily discharge from Trenton and Schuylkill has gaps, which can easily be filled by PRMS predictions at the daily time scale

However daily time step has some down sides too:

Tidal forcings at the mouth of the estuary are really important for driving water in and out of the estuary which can have a huge influence on the salt front location. Aggregating tidal signals to a daily average doesn't make sense because tides have a dominant frequency of 12 hours. Talking to @salme146 and John, there really isn't a great way to represent tidal information at the daily time step.

Information theory calculations are pretty data hungry, they require ~200-300 data points to robustly estimate the pdf of the variable depending on the distribution etc. This means that with the daily data we would only be able to make calculations every year or so, having finer resolution data would mean that we could calculate information transfer (timescales, redundancy, synergies etc) seasonally and for specific storm events, which would be really interesting.

My take is that working with the daily time scale is fine for now and might make the development of methods easier, but it might be a good idea to see what it would take to move to a sub-daily timescale.

amsnyder commented 2 years ago

@galengorski

Exploring what it would take to move to sub daily time steps for each of the data sources:

Discharge and specific conductivity at Trenton and Schuylkill:

NWIS provides "instantaneous values" of discharge and specific conductivity at both locations using the "iv" parameter code within the dataRetreival package in R which gives data at an hourly interval

We'd need to figure out how complete this data is and also how we might fill gaps

Meteorological data:

GridMET data is daily, NOAA observational data has a lot of gaps in it. CONUS404 is hourly, but not readily available right now. Other ideas?

Tidal data:

Working with hourly tidal data would be best

Salt front location:

The salt front location data that we are working with now is at the daily time step, it has been provided to us by the DRBC (see #17 for more details)

We would need to calculate the salt front using hourly specific conductivity from several sites within the bay, again we'd need to think about gaps and how to fill them

amsnyder commented 2 years ago

@salme146

Meteorological Data ideas: CONUS404 is a model based on observed reanalysis. We use ERA5 data to drive our COAWST model which is a similar product, just at a coarser resolution. I'd say use the hourly forcing time series that I use to force the COAWST model as a "first cut"

Tidal data: working with hourly data is almost required if you are talking about coastal dynamics, as the system is highly nonlinear.

Salt Front Location: I like the idea of trying to calculate the hourly time step for this work.

Filling gaps in general: are there any huge issues with some sort of nearest neighbor interpolation? I interpolate data gaps to run COAWST and even though sometimes it can look a little ugly, it's the best we can do at the moment

amsnyder commented 2 years ago

@jds485

specific conductivity at Trenton and Schuylkill: We'd need to figure out how complete this data is and also how we might fill gaps

I think we have this data processed within inland salinity. Processing involved aggregating to hourly and daily timesteps.

We would need to calculate the salt front using hourly specific conductivity from several sites within the bay, again we'd need to think about gaps and how to fill them

Which locations? There are some PRMS reaches along the coast that drain directly to the bay and we can make predictions for them with the inland salinity model.

amsnyder commented 2 years ago

@galengorski

Based on 2/15/22 meeting (notes here) trying to extract data from sub-daily tidal data so that it is usable on the daily time step seems like a more tractable approach.

One suggestion for capturing a sub-daily temporal signal in daily water level data from @aappling-usgs:

split the hourly water level data into 24 different "daily" datasets, so we would have water level at 1:00am, 2:00am etc at a daily time step. This seems like it might be worth a shot for modeling, but it will be less interpretable from an "analysis of the drivers" point of view

@ted80810 is looking into some daily summary statistics of tidal fluctuations that we might be able to use

amsnyder commented 2 years ago

@salme146

I really want to caution against using any sort of daily mean of tidal data. The major tidal frequencies are at 12, 12.42 hourly frequencies, and that signal will be aliased into other frequencies.

Taking a look at the NOAA tides and currents station at Lewes, DE: https://tidesandcurrents.noaa.gov/harcon.html?unit=0&timezone=0&id=8557380&name=Lewes&state=DE

The higher the amplitude, the more important the constituent is in representing the total tidal signal. Let's take the first 6 tidal constituents and their frequencies:

M2 - 12.42 hours S2 - 12 hours N2 - 12.62 hours K1 - 23.93 hours M4 - 6.21, a harmonic of M2, generated from nonlinearities when tide approaches shallow water (ie continental shelf) O1 - 25.81 hours M6 - 4.14 hours, a second harmonic of M2, generated from nonlinearities when tide approaches shallow water (ie continental shelf), and interacts with itself

This is why we use hourly signals in tidal time series analysis - we can identify these harmonic frequencies, and a lot of them hover around the 12 hour or 1 day frequency. I can answer more questions on this at today's (2/17) meeting.

amsnyder commented 2 years ago

@galengorski

Based on discussion from our meeting on 2/22, it seems like we might be able to pull out some daily statistics from the water level data based on the differences between the predicted and observed water level. For example in the plot below is water level data for Lewes, the blue line shows predicted water level while the green line shows observed. From 1/14-1/18 the observed water level is above the predicted likely due to an offshore storm event. This results in a net influx of water into the estuary. While the period from 1/18-1/20 observed < predicted indicating a net outflow from the estuary. @salme146 please correct me if I am misrepresenting the interpretations. A couple of ideas for daily statistics to pull out of this record:

sum(obs - predicted) water level indicative of flow into our out of the estuary

daily max water level - daily min water level: highs would indicate spring tides associated with the break up of salt front dynamics while lows would indicate neap tides associated with more stable behavior

@ted80810 I know you are working on developing some daily summaries of tidal data, maybe this could be a place to start?

amsnyder commented 2 years ago

@ted80810

Thank you @galengorski. That's a great suggestion. I started looking at the Water Level - Predictions with @salme146 yesterday. I am also looking into libraries to identify tidal signals in the NOAA NOS Data.

amsnyder commented 2 years ago

@salme146

I wanted to add the figure we discussed today to this thread. It is published by DRBC and shows the driest to wettest years based on Trenton Discharge from 1960ish-present: https://www.nj.gov/drbc/library/documents/WQAC/072820/chen_Model_SLRsimulations.pdf slide 16

amsnyder commented 2 years ago

@salme146

Just to note that if you look at the median flow from low to high, the last 20ish years 2000-2020 include 8/10 years shown on this slide. so maybe 20 years is representative of the hydrodynamic conditions we are trying to look at... thoughts??

amsnyder commented 2 years ago

@galengorski

I think this would be worth looking at, does the period from 2000-2020 capture the variability in conditions, and how evenly does it cover conditions. Hard to tell from this plot, but it looks like there might be more high flow years than low flow years

USGS-R / drb-estuary-salinity-ml

Discussion: What is the best time scale to use for our models and analysis? #129

Exploring what it would take to move to sub daily time steps for each of the data sources: