azavea / noaa-hydro-data

NOAA Phase 2 Hydrological Data Processing
11 stars 3 forks source link

Calculate base flow for NWM stream reaches #112

Closed jpolchlo closed 1 year ago

jpolchlo commented 1 year ago

When considering stream flow volume, it may make sense, for visualization purposes, to have estimates of "normal". This implies the need to process a large volume of the retrospective data to compute mean and standard deviation for every stream reach. In this fashion, we can compute a z-score to figure out whether there is flow that is out of range.

Questions that need to be answered:

  1. How wide a window is enough to get a trustworthy baseline? Too long might not catch recent trends from climate change, to short might yield bad estimates.
  2. What temporal granularity should we use? Obviously an annual estimate is missing out on seasonal variation, but daily averages seem to also be open to noise. The choice here is probably down to weekly or monthly.
  3. How should we store these results? It might be sensible to build a database table in RDS, but that may be trickier to use than a single Zarr file when processing in Dask.
jpolchlo commented 1 year ago

For the time being, I'm settling on using a 10-year window, aggregating weekly. I'll write out results to a zarr file somewhere on S3, and if it seems useful, we can later ingest this into a DB. But, it may be just as well to leave it as a zarr and do any combinations using xarray.