Generic insert peak function

narlesky commented 2 years ago

Hi @danhamill thanks for sharing this repo! It's great to see this algorithm implemented in Python -- so much more accessible!!

As I was reading through the peak insertion logic, I had a question about the calculation for how the daily volume is reapportioned when a new peak is added. Does this make sense as a way to express what you've written? I was curious about the origin of the 0.9 factor -- is that an approximation of 22 / 24? Thanks in advance! - Carly

def insert_peak_generic(daily_accumulation, hourly_accumulation, date, value, at_hour=None):
  """
  Accept daily_accumulation, hourly_accumulation, date and value. Insert 
  hourly peak flow of magnitude value at <at_hour> of date. Remaining daily
  volume proportionally divided between preceeding and following portions of
  remaining day. Return new hourly_accumulation.

  Arguments:
    daily_accumulation (pandas.Series): 
    hourly_accumulation (pandas.Series): 
    date (pandas.Period): date of tabulated peak value
    value (int64?): tabulated peak value at specified date
    at_hour (int): hour of day to insert peak, 0 (12am) to 23 (11pm)
  Returns:
    hourly_accumulation (pandas.Series): 
  """
  beginning_daily_sum = daily_accumulation[date]
  ending_daily_sum = daily_accumulation[date + 1]
  daily_flow = ending_daily_sum - beginning_daily_sum
  start_hour = pd.Period(freq="H", year=date.year, month=date.month, day=date.day, hour=at_hour)
  end_hour = start_hour + 1
  # end_hour = pd.Period(freq="H", year=date.year, month=date.month, day=date.day, hour=at_hour + 1)

  # hourly average flow from daily peak value
  value_hourly = value / 24.0

  # 11 am: Insert hourly peak flow of magnitude value at noon of date. Remaining daily
  # volume proportionally divided between preceeding and following portions of remaining day.
  if at_hour == 11:
    reapportion = (daily_flow - value_hourly) * 11 / 23

  # 10 pm: Insert hourly peak flow of magnitude value at midnight of date. Remaining daily
  # volume applied to first 23 hours of day.
  elif at_hour == 22:
    reapportion = daily_flow - value_hourly - value_hourly * 0.9

  # 11 pm: Insert hourly peak flow of magnitude value at midnight of date. Remaining daily
  # volume applied to first 23 hours of day.
  elif at_hour == 23:
    reapportion = daily_flow - value_hourly - value_hourly

  # 12 am: Insert hourly peak flow of magnitude value at 0000 of date. Remaining daily
  # volume applied to remaining 23 hours of day.
  elif at_hour == 0:
    reapportion = 0 # + (daily_flow - value_hourly)

  # 1 am: Insert hourly peak flow of magnitude value at 0100 of date. Remaining daily
  # volume applied to remaining 23 hours of day.
  elif at_hour == 1:
    reapportion = value_hourly * 0.9 # + (daily_flow - value_hourly)

  else:
    raise NotImplementedError(f'insert_peak_generic not implemented for at_hour={at_hour}')

  # update hourly_accumulation series values
  hourly_accumulation[start_hour] = beginning_daily_sum + reapportion
  hourly_accumulation[end_hour] = hourly_accumulation[start_hour] + value_hourly
  return hourly_accumulation

danhamill commented 2 years ago

@narlesky That is a good question. I asked Rob and he also could not remember why he used a value of 0.9 in the at_hour == 22, at_hour == 1 elif statements.

Thinking about what the function is intended to do, I think you found an issue worth exploring. For example the same logic could be rewritten like:

 elif at_hour == 22:
    reapportion = daily_flow - 1.9*value_hourly
  elif at_hour == 23:
    reapportion = daily_flow - 2*value_hourly

which does not make much sense. I will test and get back to you.

narlesky commented 2 years ago

Thanks @danhamill!

danhamill commented 2 years ago

@narlesky Things have been busy but I have made a little progress simplifying the code. See Spline_from_Pandas.py

The first thing I wanted to get rid of was all the faux-date business. I think treating everything as a pd.Period will avoid any date overflow issues with pd.Timestamp.

The next thing I was working on was getting away from the text file inputs and outputs. My goal is to read and write directly from dss. The spline function in Spline_from_Pandas accepts a dataframe with two columns. Right now I have the flow column hard coded, which is undesirable. Passing a dataframe implies I am reading from dss outside of the method, but I think we could make this an object to simplify the usage of the algorithm. Here is how I was reading in the time series:

from pydsstools.heclib.dss import HecDss
from core.Spline_from_Pandas import spline

def get_daily_ts(path, dss_file):
    with HecDss.Open(dss_file) as fid:
        ts = fid.read_ts(path, trim_missing=False)
        times = ts.pytimes
        values = ts.values
    tmp = pd.DataFrame(data = {'date': times, 'flow':values.copy(), 'site':path.split("/")[2]})
    tmp.date = tmp.date - pd.DateOffset(hours = 1)
    return tmp

df_daily = get_daily_ts(path, file))
df_hourly = spline(pivot)

Another thing I have yet to tackle is to investigate other types of splines. This:

spline_function = interpolate.splrep(masked_dates.compressed(),
    hourly_accumulation.dropna().values, s=0)

is by no means optimal. This type of spline has to go down before it goes up. When you run the spline on an accumulation time series, going down results in negative flows when the interpolated hourly accumulation function is converted back to a hydrograph using successive differences.

Lately I have been working on local flows which aren't the best to test with since there can be large portions of the time series where there aren't any local flows.

narlesky commented 2 years ago

@danhamill thanks for the update! What a funny coincidence that I had just visited this and the CriticalDuration repos earlier today and starred them and shared to our GitHub org! I definitely understand the balancing act between text and DSS inputs. We are still working with CSVs a decent portion of the time but I can see why the pydsstools integration would be so valuable!

danhamill commented 2 years ago

Haha. I saw all the activity on git and remembered I hadn't committed lately. I was mid-way through an update and got sidetracked.

narlesky commented 2 years ago

Well it's much appreciated!

danhamill commented 2 years ago

@narlesky off topic, but I came across Qiusheng Wu's github profile and he has some really great open source GIS repositories. Check out leafmap and geemap.

narlesky commented 2 years ago

@danhamill Oh cool! I used to work on a project with a similar theme based on Mapbox. I'll have to check it out!

danhamill / CVHSSmoothing

Generic insert peak function #2