Precip Fidelity Project Overview

rburghol commented 10 months ago

Overview

Project Brief Goals and Outline

DEQ needs a process of identifying time and locations of precipitation input errors, and to use that information to rank varying precipitation inputs according to accuracy, and to create an aggregate dataset from the best available spatially and temporally. This analytical process should be able to integrate into existing DEQ workflows, and Virginia Tech should work with DEQ to design updated workflows where necessary to support the integration of these new analytical processes.

Tasks

DEQ:
- [x] Command line raster download, clip and insert routine (avoid PHP/Drupal if possible)
  - [x] #35
  - [x] PRISM #33
  - [x] NLDAS2 #34
  - [x] daymet #37
  - [ ] NOAA drought?
- [ ] #55
  - [ ] SQL to mashup a coverage timeseries from 2 rasters based on null values
  - [ ] SQL to mashup a coverage timeseries based on multiple rasters and a gage flow agreement rating
  - [ ] SQL to disaggregate daily raster (or n-length period) to hourly
  - [ ] Can we disaggregate based on gage + lag factor for travel time?
- [ ] #54
  - [x] Tiling impact on import
  - [x] Tiling impact on SQL performance
  - [x] Limit the number of bands impact on import
  - [x] Number of bands impact on SQL performance
  - [x] Storage in/out of the database?
  - [ ] Clipping/Excerpting rasters by basin for later processing
Both
- [ ] geo work flows:
  - [x] Simple lm (monthly best model)
  - [ ] Weekly min(error) lm
  - [x] Storm flow separation weekly lm
  - [x] Re-sample to 1km x 1km ONLY
- [ ] amalgamate workflows
  - [ ] SQL to find null values and develop a disaggregation scheme from nearest neighbors (involves a raster workflow, maybe nullifying cells and nearest neighboring them till they are all filled with some values)
  - [ ] Data model for efficient raster local storage
  - [ ] Efficient knitting together of rasters for a full, optimized timeseries.

Diagram

See info on mermaid diagrams: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-diagrams


graph TD
Rdata(Raster Data Acquisition) --> Rstore(Raster Data Storage in Database)
USGS(USGS Flow Timeseries By Basin) --> Prec(Precipitation Event Recognition)
Rts(Raster Data Timeseries By Basin) --> Prec(Precipitation Event Recognition) 
Rstore --> Rts
Prec --> Pmiss(Identify Missing Precip Events)
Prec --> Pover(Identify Extra/Overestimated Precip Events)
Pmiss --> Rmash(Agglomerate Raster Dataset for Better Fit)
Pover --> Rmash
Rstore --> Rmash
Rmash --> Rstore
Rstore --> Mmet(Model Meteorology Dataset Creation)
Mmet --> Mrun(Model Runs by Basin)
Mrun --> Meval(Comparison of Modeled to Observed)
USGS --> Meval
Meval --> Rmash
Mrun --> Rmash



### Introduction
Precipitation timing and magnitude is the single most important factor in water availability, and is therefore a crucial component of hydrologic modeling.  Specifically, precipitation timing, magnitude and intensity determines base-flow recharge and riverine system stability and resiliency during drought periods.  

Our current generation of models are able to provide 6-18 month minimum flow projections in a large number of Virginia watersheds with base flow cycles of the same duration, however, the uncertainty of those predictions, even in the most well-calibrated watersheds, is unknown.  The crucial piece of information needed to inform this is understanding how well the models represent baseflow dynamics over a the multi-year timescale, and this area is heavily dependent on accurate rainfall inputs.  

Precipitation spatial variability is high, and while radar-based observations offer a high spatial resolution, they are dependent upon correlation with ground based observations that come from a very sparse, point based monitoring system, resulting in substantial precipitation interpolation.  While the likelihood of precipitation errors are well understood, no practical method of quantifying them currently exists for geographically large model domains.  Given this inability to quantify these errors directly, methods for detecting the signature of precipitation errors in hydrologic model are needed, however, these methods have not been well established, and therefore, our understanding of the extent to which precipitation errors create hydrologic model errors is poor.  

As a result, while we possess models that can provide a quantitative estimate of baseflow resiliency to future multi-year droughts, our ability to define the error bounds of these estimates is hampered by our inability to understand the extent to which model errors are the result of poor model capability (from conceptual limitations or faulty model calibration), or simply a result of erroneous precipitation estimates.

DEQ needs a process of identifying time and locations of precipitation input errors, and to use that information to rank varying precipitation inputs according to accuracy, and to create an aggregate dataset  from the best available spatially and temporally.  This analytical process should be able to integrate into existing DEQ workflows, and Virginia Tech should work with DEQ to design updated workflows where necessary to support the integration of these new analytical processes.

### Pre-Development Steps
- ~~Construct SOW~~
- ~~Dissect and understand FEWS capabilities:~~
   - ~~Schedule FEW desktop demo from ICPRB, with questions:~~
      - ~~What data sources are available?~~
      - ~~Process of scripting/developing new algorithms if suitable mashups not available?~~
   - ~~Schedule DELFT to give us a demo of web app and explanation of why we might use it~~
   - ~~Where does the FEWS database live?~~

### Project Objectives
- Develop method of merging met datasets (build on mash-up)
- Develop download and import/processing tools for multiple NOAA met data sets
   - https://psl.noaa.gov/data/gridded/
   - https://psl.noaa.gov/data/gridded/tables/monthly.html
   - https://www.hec.usace.army.mil/confluence/hmsdocs/hmsguides/working-with-gridded-boundary-condition-data/gridded-data-sources
   - https://water.weather.gov/precip/
   - PRISM: https://prism.oregonstate.edu/downloads/
   - See also: https://github.com/HARPgroup/vahydro/issues/398
   - Delft FEWS :https://www.deltares.nl/en/software-and-data/products/delft-fews-platform
      - Github online FEW interface/system https://github.com/Deltares/fews-web-oc/blob/main/docs/public/deployments/README.md
- Develop basic workflows to analyze precip based on gage, and select best model for multiple periods in the model record
   - Pose criteria for precip/temp fidelity and mash-up to identify "best met data"
   - Base workflow on postgis raster processing to streamline data storage and speed processing.
   - Develop workflow in Delft-FEWS online platform 
   - Prototype in Delft-FEWS desktop
- Characterize relationships and types of errors:
   - What is monthly minimum effective precipitation at each gage? (i.e., the value of precip that leads to a bump)
   - Gage/Met disagreement types:
      - "Phantom Storm" in model (gage has no bump, model does)
      - "Missing Storm" in model (gage bump, model does not)
   - Period flow errors (model):
      - Do Gage/Met disagreements correlate with model error?
      - Do years with mean flow errors correlate with precip fidelity? (the CBP model is known to be calibrated to very small error ranges in mean annual flow).
   - Ultimately, can we answer "is this error likely due to bad met inputs or model mechanics?"
- Evaluate the "best" overall precipitation data sources for various geographic scales for specific model time periods
  - At a specific model watershed, we want to identify the precipitation data set that best matches the gage flow in the region. This may change between model timesteps, with different days in the same region relying on different datasets
  - The metrics and workflows above should inform this process to determine the appropriate cut-over dates or geographic regions

- References:
   - "Understanding Precipitation Fidelity in Hydrological Modeling" (Mobley) https://ascelibrary.org/doi/10.1061/%28ASCE%29HE.1943-5584.0000588
   - "A new method for establishing hydrologic fidelity of snow depth measurements based on snowmelt–runoff hydrographs" https://www.tandfonline.com/doi/pdf/10.1080/02626667.2018.1438613
   - "Environmental Flow Components for Measuring Hydrologic Model Fit during Low Flow Events" (Mobley) https://www.researchgate.net/publication/267777206_Environmental_Flow_Components_for_Measuring_Hydrologic_Model_Fit_during_Low_Flow_Events

### Data Model
#### Proposed model
- Use USGS full drainage shapes (in vahydro, bundle='watershed', ftype='usgs_full_drainage')
- Create `dh_timeseries_weather` records for snippet rasters, attached to the `usgs_full_drainage` feature, for best fit data sources.
   - use varkey `met_hourly_best_fit`
- Assemble a full timeseries raster of the baseline data source (whichever datasource ends up being least erroneous), then patch the rasters with best-fit rasters for each `usgs_full_draainage` record added to the database.

### Examples
#### Merging Data from 2 different precip rasters to get best data set.
- See "Combine Daily and WYTD Time Series" in HARPgroup/vahydro#666
- See also, NLDAS2 raster to dh: HARPgroup/vahydro#586

#### Upper James River near Bedford
- Under-simulation (_e_ >= -25%) of L90 in 1991, 1998, 2001, 2013, 2014, 2018, 2019, 2020, 2022.
- Over-simulation (_e_ >= +25%) of L90 in 1989, 1996, 2000, 2009.
- Error in 2019 (drought year) was -38%.
- Note: Error in 2005 is reported at +62%, however, the gage did not go into service until October 1st, so comparison is invalid.

**Figure 1:** Modeled versus observed 90-day low flow at James river USGS 02024752 from 2005-2023.
![image](https://github.com/HARPgroup/vahydro/assets/4571170/683ae2d0-84d9-4753-9c89-a871e7af2392)

| Year| USGS| Model| pct_error|
|----:|----:|-----:|---------:|
| 1984| 2477|  2387|        -4|
| 1985| 1579|  1614|         2|
| 1986|  911|   840|        -8|
| 1987| 1198|   950|       -21|
| 1988|  936|   909|        -3|
| 1989| 2569|  3211|        25|
| 1990| 1178|  1384|        17|
| 1991|  757|   495|       -35|
| 1992|  947|   968|         2|
| 1993|  832|   728|       -12|
| 1994|  899|   944|         5|
| 1995| 1105|  1148|         4|
| 1996| 2539|  3200|        26|
| 1997|  857|   732|       -15|
| 1998|  788|   454|       -42|
| 1999|  753|   659|       -12|
| 2000| 1213|  1569|        29|
| 2001|  654|   473|       -28|
| 2002|  667|   686|         3|
| 2003| 3571|  4397|        23|
| 2004| 1807|  1694|        -6|
| 2005| 1207|  1272|         5|
| 2006| 1488|  1806|        21|
| 2007|  729|   595|       -18|
| 2008|  654|   658|         1|
| 2009|  961|  1482|        54|
| 2010|  903|   826|        -9|
| 2011| 1064|   983|        -8|
| 2012|  863|   795|        -8|
| 2013| 1092|   822|       -25|
| 2014| 1017|   755|       -26|
| 2015| 1477|  1330|       -10|
| 2016| 1441|  1329|        -8|
| 2017|  906|   793|       -12|
| 2018| 1947|  1347|       -31|
| 2019| 1042|   650|       -38|
| 2020| 2461|  1665|       -32|
| 2021| 1175|  1016|       -14|
| 2022| 1142|   857|       -25|
| 2023| 1293|  1504|        16|

#### Gage Stability 
- A measure of the magnitude of stage-discharge table adjustments made during regular site visits at varying flow level and month.
- See: HARPgroup/vahydro#1004
- Example: 02025500 James River at Holcomb Rock, VA, gage error at low-flows < 5% (very good gage)

| Flow Percentile | January Error | January Flow | January Error | January Flow | March Error | March Flow | April Error | April Flow | May Error | May Flow | June Error | June Flow | July Error | July Flow | August Error | August Flow | September Error | September Flow | October Error | October Flow | November Error | November Flow | December Error | December Flow |
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
| Less than or equal to 5th percentile | 0.00 | 739 | 0.00 | 932 | 0.00 | 1,115 | 0.03 | 1,591 | 0.00 | 1,207 | 0.00 | 753 | -0.02 | 695 | -0.04 | 540 | 0.00 | 487 | -0.01 | 563 | 0.00 | 591 | -0.04 | 716 |
| 5th-10th percentile | 0.01 | 974 | 0.00 | 1,215 | 0.02 | 1,570 | 0.01 | 1,788 | 0.00 | 1,457 | 0.00 | 927 | -0.01 | 794 | -0.05 | 654 | -0.01 | 640 | -0.01 | 617 | 0.00 | 701 | -0.02 | 817 |
| 10th-25th percentile | 0.01 | 1,292 | 0.00 | 1,663 | 0.01 | 2,148 | 0.01 | 2,119 | 0.00 | 1,802 | 0.00 | 1,139 | -0.01 | 904 | -0.04 | 763 | -0.03 | 699 | -0.02 | 700 | -0.03 | 811 | -0.01 | 1,035 |
| 25th-50th percentile | 0.00 | 2,082 | 0.01 | 2,624 | 0.01 | 3,751 | 0.00 | 3,037 | 0.00 | 2,657 | 0.01 | 1,433 | -0.01 | 1,081 | -0.01 | 926 | -0.02 | 853 | 0.00 | 834 | -0.01 | 1,025 | 0.00 | 1,648 |
| 50th-75th percentile | 0.00 | 3,400 | 0.01 | 3,985 | 0.00 | 5,822 | 0.00 | 5,275 | 0.00 | 4,110 | 0.01 | 2,144 | -0.01 | 1,415 | -0.02 | 1,174 | -0.03 | 1,076 | 0.00 | 1,085 | -0.02 | 1,786 | 0.00 | 2,990 |
| 75th-90th percentile | -0.01 | 6,783 | 0.00 | 6,673 | 0.00 | 10,184 | 0.00 | 9,352 | 0.00 | 6,631 | 0.00 | 4,008 | 0.01 | 2,084 | -0.02 | 1,595 | -0.01 | 1,825 | 0.00 | 2,027 | 0.00 | 3,759 | 0.00 | 5,650 |
| 90th-95th percentile | 0.00 | 11,068 | 0.00 | 11,420 | 0.00 | 17,268 | 0.00 | 15,343 | 0.00 | 10,132 | 0.00 | 8,296 | 0.01 | 3,204 | 0.04 | 2,249 | 0.00 | 4,157 | 0.00 | 3,278 | 0.00 | 6,661 | 0.00 | 9,145 |
| Greater than 95th percentile | -0.01 | 25,636 | 0.00 | 28,250 | 0.00 | 26,608 | 0.00 | 25,518 | 0.00 | 19,196 | 0.00 | 20,075 | 0.00 | 7,067 | 0.02 | 3,980 | 0.00 | 13,023 | 0.00 | 7,593 | 0.00 | 16,608 | 0.00 | 17,200 |

rburghol commented 6 months ago

COBrogan commented 4 months ago

@rburghol A simple batch script for running multiple download and import scripts. Note that I've hard coded the start/end times of the datasets. Probably not necessary given our set-up of the download script.

# set needed environment vars
MODEL_ROOT=/backup/meteorology/
MODEL_BIN=$MODEL_ROOT
SCRIPT_DIR=/opt/model/model_meteorology/sh
MET_SCRIPT_PATH=$SCRIPT_DIR
export MODEL_ROOT MODEL_BIN SCRIPT_DIR MET_SCRIPT_PATH

#Download and import PRISM and daymet rasters between dates as available:
startYear=1983
endYear=2024
#Set availability dates for ease:
daymetStartAvailable=1980
daymetEndAvailable=2023
PRISMStartAvailable=1895
PRISMEndAvailable=2024

for (( YYYY=$startYear ; YYYY<=$endYear ; YYYY++ )); do
    echo "Running download and import sbatch for $YYYY"
    #Daymet download script: Only download daymet data if available
    if [ $YYYY -ge $daymetStartAvailable ] && [ $YYYY -le $daymetEndAvailable ]; then
    metsrc="daymet"
    doy=`date -d "${YYYY}-12-31" +%j`
    #Create a loop that runs a slurm job for the download and import script for each day of the year
    i=0

    while [ $i -lt $doy ]; do
    thisdate=`date -d "${YYYY}-01-01 +$i days" +%Y-%m-%d`
    sbatch /opt/model/meta_model/run_model raster_met "$thisdate" $metsrc auto met
    i=$((i + 1))
    done

    fi

    #PRISM download script: Only download daymet data if available
    if [ $YYYY -ge $PRISMStartAvailable ] && [ $YYYY -le $PRISMEndAvailable ]; then
    metsrc="PRISM"
    doy=`date -d "${YYYY}-12-31" +%j`
    #Create a loop that runs a slurm job for the download and import script for each day of the year
    i=0

    while [ $i -lt $doy ]; do
    thisdate=`date -d "${YYYY}-01-01 +$i days" +%Y-%m-%d`
    sbatch /opt/model/meta_model/run_model raster_met "$thisdate" $metsrc auto met
    i=$((i + 1))
    done

    fi
done

rburghol commented 1 month ago

Prism (points) versus NLDAS2 (blocks) Rapidan River:

rburghol commented 1 month ago

Current needs:

Debug memory error again.
Begin coding a amalgamate
REST and detailed analytics
pilots storm flow volume
Overlaps version to match CbP

HARPgroup / model_meteorology