GEO_MET_MODEL option 3: Storm Hydrograph Analysis and Precip/Hydrograph Volume Analysis

COBrogan commented 2 months ago

If we look at precipitation and stormflow data on an individual scale, it appears that the total volume of precipitation may be related to the volume in the storm hydrograph. This takes a slightly different approach than correlating weekly precip and flow because now we are honing in to individual storm events and looking more specifically at how precipitation increases flow: stormPlot14

[X] Develop a methodology to separate out storm hydrographs to isolate individual storm events. See stormSep_USGS
[X] Calculate storm hydrograph volume for each event. This is accomplished in stormSep_USGS.R
[ ] Analyze relationship between hydrograph volume and precipitation volume See below for Culpepper gage 01665500. Looking at monthly precip and monthly storm hydrographs (lumping storms into months based solely on their start date. Storms that extend beyond the month are still included in their start month), we get pretty strong correlations in some of our winter months. October in particular has a very good correlation, close to 0.75. Running a power regression slightly worsens the R^2 values, but improves the residual error analysis: Current thoughts are that we may see good relationships with this, but we may run into some travel time issues as we move downstream. Should only cause issues for systems that have more than 1-day of travel time, which are likely to be relatively large systems (greater than ~200 square miles based on super rough math of a circular watershed with a radius of 16 miles, which assumes travel time of 1-day at 1 ft/s). Visualizations will help us get a feel for how affected our results are by travel time as we expand this routine to more gages
[X] Determine means to analyze which dataset performs the best - Following #61 , we can select the "best" dataset as that that performs the best for that month. So, we can group our events up by month across the record period and run the regression for each month and determine performance
[ ] Find means to determine missing precipitation events (e.g. low precip volume) for corresponding hydrographs
[x] Plot each storm hydrograph with precipitation and separate into its own steps
[x] Analysis steps cannot depend on a list of storm events. Because these steps have been divided out, they must instead rely on a data.frame that has storm IDs built into it and the stormStats must have the corresponding ID

Important R Scripts: stormSeparate_USGS.R, plotStorm.R, plotStormSep_cmd.R, stormAnalysis_cmd.R, stormEventsLM_cmd.R, stormSep_cmd.R

In scenario con e.g. p6/cbp_wsm/config/control/met/stormVol_prism.con, GEO_MET_MODEL must be storm__volume

Several important variables defined in geo.config:

# Storm separation outputs:
STORM_EVENT_FLOW_FILE="$MET_EXPORT_DIR/$scenario/flow/${coverage}-stormevent-flow.csv"
STORM_EVENT_STATS_FILE="$MET_EXPORT_DIR/$scenario/flow/${coverage}-stormevent-stats.csv"
STORM_EVENT_PRECIP_PLOT_DIR="$MET_EXPORT_DIR/$scenario/plots/stormPlots/"

Several important variables defined in scenario config e.g. stormVol_prism.con:

#Set additional variables for Storm Separation Routine:
SEARCH_ALL_STORM="TRUE"
BASELINE_FLOW_METHOD="Month"
STORM_INCLUDE_DURATION=1
STORMSEP_REGRESSION_METHOD="LINEAR"
STORMSEP_PLOT="TRUE"

Example Model Run No Plots

MODEL_ROOT=/backup/meteorology/
MODEL_BIN=$MODEL_ROOT
SCRIPT_DIR=/opt/model/model_meteorology/sh
export MODEL_ROOT MODEL_BIN SCRIPT_DIR

cd /backup/meteorology

/opt/model/meta_model/run_model raster_met stormVol_prism usgs_ws_02021500 auto geo download
/opt/model/meta_model/run_model raster_met stormVol_prism usgs_ws_02021500 auto geo process
/opt/model/meta_model/run_model raster_met stormVol_prism usgs_ws_02021500 auto geo analyze

Example Model Run With Plots

MODEL_ROOT=/backup/meteorology/
MODEL_BIN=$MODEL_ROOT
SCRIPT_DIR=/opt/model/model_meteorology/sh
export MODEL_ROOT MODEL_BIN SCRIPT_DIR

cd /backup/meteorology

/opt/model/meta_model/run_model raster_met stormVolPlot_prism usgs_ws_02075500 auto geo download
/opt/model/meta_model/run_model raster_met stormVolPlot_prism usgs_ws_02075500 auto geo process
/opt/model/meta_model/run_model raster_met stormVolPlot_prism usgs_ws_02075500 auto geo analyze

geo
- download
  - NA
- process see https://github.com/HARPgroup/model_meteorology/issues/65
  - 05_stormSeparate= stormSep_cmd.R
  - Write out the full flow data with the stormIDs to create a file from which the storms may easily be extracted
  - Outputs a csv that contains the base flow, the total flow, and baseline flow used to parse out "too small storms" via $STORM_EVENT_FLOW_FILE Inputs
    1. The path to the USGS gage data downloaded in a previous step of geo e.g. $COVERAGE_FLOW_FILE
    2. allMinimaStorms = Considers ALL local minima as potential storm endpoints. Will distinguish between peaks in the case of a multi-day storm hydrograph if set to TRUE. Otherwise, a storm is only considered as two subsequent local minima that fall below baseline flow. Defined in Config file as $SEARCH_ALL_STORM
    3. baselineFlowOption = One of c("Water Year","Month","Calendar Year","Climate Year","All"). Defaults to "All". Determines how program will develop baseline flow. Uses horizontal line regression of baseflow and breaks it down based on all time, water year, calendar year, or month based on user selection. Defined in config as $BASELINE_FLOW_METHOD
    4. pathToWrite = the path to write out csv output files to $STORM_EVENT_FLOW_FILE
  - 06_stormStatistics= stormAnalysis_cmd.R
  - Outputs a csv that contains several statsitcs about each storm. This includes event_start, event_end, USGS storm volume, precip volume, rising/falling limbs, durations, etc. Writes to $STORM_EVENT_STATS_FILE Inputs
    1. The path to the storm separated data from step 02 for this gage e.g. $STORM_EVENT_FLOW_FILE
    2. pathToWrite = the path to write out csv output files to $STORM_EVENT_STATS_FILE
- analyze
  - 02_model_storm (was 02_stormVolumeRegression) calls stormEventsLM_cmd.R
  - Takes in storm statistics for each event found in 02-stormSeparate as generated in 03-stormStatistics and compares the stream volume above baseflow to the precipitation volume that occurs during the storm hydrograph event separated in 02-stormSeparate. Can optionally take an input to evaluate volume X days before the event in addition to the event itself
  - Computes either a monthly linear or monthly log-linear regression to compare the volumes and outputs a csv of statistics ($MODEL_STATS) and json output ($MODEL_JSON) from mon_lm function Inputs
    1. The file path to the combined precip and flow data at that USGS gage $DAILY_PRECIP_FILE
    2. The file path to the statistics output by the stormSep_USGS script for each storm found in the hydrographs $STORM_EVENT_STATS_FILE
    3. The file path to the storms output by stormSep_USGS script $STORM_EVENT_FLOW_FILE
    4. The rolling duration of precip to include prior to the storm. So, at rollingDur = 1 all precip will be summed from the storm duration only. At rollingDur = 2, the precip will be summed for the storm duration AND will include 1-day prior to the storm. Defined in config file as $STORM_INCLUDE_DURATION
    5. STORMSEP_REGRESSION_METHOD= Should the regressions performed be power regression or linear regression? Defined in config file as $STORMSEP_REGRESSION_METHOD
  - 05_plot_storm
  - A script that will plot each individual storm event identified in 02-stormSeparate and will include the stream flow, baseflow, baseline flow, and regressions for the rising and falling limb. Only operates if $STORMSEP_PLOT in config is set to TRUE Inputs
    1. The path to the storm separated data from step 02 for this gage $STORM_EVENT_FLOW_FILE
    2. The path to the statistics generated for the storms $STORM_EVENT_STATS_FILE
    3. The directory to store the plots in, defined in config as $STORM_EVENT_PLOT_DIR
    4. The USGS gage name for plot ID defined in config as $USGS_GAGE
  - 06_plot_stormVolume
  - A script that will plot each individual storm event identified in 02-stormSeparate and will include the stream flow, baseflow, baseline flow, and precip from each data source. The plot will include the event and 1-week prior to and after the start/end of the event. Only operates if $STORMSEP_PLOT in config is set to TRUE Inputs
    1. Combined daily stream and precip data for a data source from $DAILY_PRECIP_FILE
    2. The path to the storm separated data from step 02 for this gage $STORM_EVENT_FLOW_FILE
    3. The path to the statistics generated for the storms $STORM_EVENT_STATS_FILE
    4. The directory to store the plots in, defined in config as $STORM_EVENT_PRECIP_PLOT_DIR
    5. The USGS gage name for plot ID defined in config as $USGS_GAGE
amalgamate
- download
  - na see https://github.com/HARPgroup/HARParchive/issues/1282
- process
  - Create a ranking file based on the best data set
    - Ex: https://github.com/HARPgroup/HARParchive/blob/master/HARP-2024-2025/methods/simple_monthly_lm.R

COBrogan commented 1 month ago

Current workflow: Get USGS data: $ Rscript functions/usgsdata.R '02037500' "$HOME/usgsGageCMD.csv" Run stormSep_cmd.R: Rscript ../../meta_model/scripts/met/stormSep_cmd.R "${HOME}/usgsGageCMD.csv" 'TRUE' 'Month' "${HOME}/Gage02037500_StormflowData.csv" Run the analysis on the storms and output STATs: Rscript ../../meta_model/scripts/met/stormAnalysis_cmd.R "${HOME}/Gage02037500_StormflowData.csv" "${HOME}/Gage2037500_StormStats.csv" Get linear regressions: Rscript ../../meta_model/scripts/met/stormEventsLM_cmd.R "${HOME}/usgs_ws_02037500-PRISM-all.csv" "${HOME}/Gage2037500_StormStats.csv" "${HOME}/Gage2037500_StormflowData.csv" 1 "${HOME}/modelJSON.json" "${HOME}/stormsepLMResults.csv" Run plots: Rscript ../../meta_model/scripts/met/plotStormEvent_cmd.R "${HOME}/usgs_ws_02037500-PRISM-all.csv" "${HOME}/Gage2037500_StormflowData.csv" "${HOME}/Gage2037500_StormStats.csv" "{$HOME}/stormPlotsNew" "02037500" "PRISM"

COBrogan commented 4 weeks ago

FYI @ilonah22 I've merged this branch (geo-stormSep) into main. I will be working on getting the meta model running tomorrow hopefully.

COBrogan commented 3 weeks ago

@rburghol How do I specify what GEO_MET_MODEL I'm running? I changed the following to set GEO_MET_MODEL to storm_volume nano /opt/model/p6/cbp_wsm/config/control/met/PRISM.con However, when I run the following process command, an echo check confirms that GEO_MET_MODEL is still set to simple_lm. Am I missing a config file somewhere?

/opt/model/meta_model/run_model raster_met PRISM usgs_ws_02021500 auto geo process 05

rburghol commented 3 weeks ago

@COBrogan looks like you set it ip correctly to me -- you may want to verify the cbp get_config... command that is run in geo.config at the command line -- also make sure you run from the p6/vadeq directory.

Feel free to edit the geo.config file to debug and maybe also grep the files to see if I hard-coded that param by accident somewhere, like in raster.config

rburghol commented 3 weeks ago

Oh!! @COBrogan I Just saw you set the config in cbp_wsm/config. That setting should be applied in vadeq/config/... -- it should all work then. If I indicated cbp_wsm in some of the instructions that was an error on my part -- apologies.

COBrogan commented 3 weeks ago

@rburhol The storm_volume workflow seems to be working! I ran it with gage 02021500 and successfully generated the monthly rating file for PRISM. I haven't tested the plotting yet, but I want to work on that a little more locally first. However, I did encounter a few permissions errors that I wanted to note here:

I had to manually change the model steps (e.g. analyze/02_model_storm) to ensure they had execute privilege for all users

The install commands at the end of analyze/02_model_lm_simple and analyze_02_model_storm don't work for me unless I sudo as you. For instance:

install -D /tmp/usgs_ws_02021500_1722430958_12797/usgs_ws_02021500-PRISM-storm_volume-model.json /media/model/met/PRISM/stats/
install: cannot create regular file '/media/model/met/PRISM/stats/usgs_ws_02021500-PRISM-storm_volume-model.json': Permission denied

This command is a product of the last line of the analyze/02 steps, which are:

install -D $json_file "$MODEL_JSON" 
install -D $rate_file "$RATING_FILE"

ilonah22 commented 3 weeks ago

I've been running into a few issues while trying to run the storm workflow on a different gage, but this is the only one I have gotten really stuck on so far while running stormEventsLM_cmd.R:

Error in (dateIndex - rollingDur - stormDuration + 2):dateIndex :
  argument of length 0
Calls: mapply -> <Anonymous>

I have been making a few minor changes, but nothing seems to be working, so I was wondering if you had a similar issue or if this error means I messed something up in one of the earlier steps.

COBrogan commented 3 weeks ago

Hi @ilonah22 . If you pull the main branch of the meta model repository, I suspect your error will be resolved. Just like we had trouble with the dates yesterday, I had some trouble today. This time I had to remove storms that occur after the precip record has ended. So, I had to add the following:

#Remove any storms that occur after precip record
stormStats <- stormStats[stormStats$endDate <= max(comp_data$obs_date),]
stormSepDF <- stormSepDF[stormSepDF$timestamp <= max(comp_data$obs_date),]

I also have gotten everything up and running in the meta model. We should now be able to use the model on the server to run these, which will make it much easier to run multiple scenarios (no need to enter file paths, copy+paste files, etc.). Do you have some availability this afternoon to go over that? Maybe around 1:30? We could get you set-up in the model so you can run gages easily

COBrogan commented 3 weeks ago

@rburghol after some testing, it looks like @ilonah22 also doesn't have the write permissions to /media/model/met/. She too was getting "Permission Denied" errors when trying to run the following (which worked fine for me now). If you have some availability, I'd appreciate the help in getting her set-up. And I'm happy to hop-on a call to learn more about the groups permissions:

/opt/model/meta_model/run_model raster_met stormVol_prism usgs_ws_01634500 auto geo download

I tried sudo chmod -R 775 /media/model/met/stormVol_prism but was told that this "Operation not permitted". I think I missing something. Ilonah is already a part of the harp group so she would have access to this folder as soon as the right (write and execute) permissions are set:

ls -l /media/model/met/
total 4
lrwxrwxrwx 1 rob      harpmodelers   25 Jul 26 13:35 PRISM -> /backup/meteorology/PRISM
lrwxrwxrwx 1 rob      harpmodelers   26 Jul 26 13:36 daymet -> /backup/meteorology/daymet
lrwxrwxrwx 1 rob      harpmodelers   26 Jul 26 13:35 nldas2 -> /backup/meteorology/nldas2
drwxr-sr-x 6 cobrogan harpmodelers 4096 Jul 31 16:57 stormVol_prism

ilonah22 commented 2 weeks ago

Here are some of my visual analyses of the ratings from 10 gages produced the by stormVol_prism workflow this week:

MonthlyRSquared ^ Each of the different colored lines represents one of the 10 gages I used, the red boxes show places where I thought I might see a trend.

MonthlyBoxplot Seasonal Rsquared

After seeing the results of these analysis, right now my next planned steps are:

[x] Run some more gages to ascertain the consistency of these trends on a larger scale and see if new trends arise with a larger sample size.
[ ] Further quantify some of the trends which this initial analysis has uncovered, such as the low r-squared in July and the high r-squared in February and October.

rburghol commented 2 weeks ago

@ilonah22 I really like what I'm seeing, these are effective ways to compare multiple times series, and also two highlight the areas that you think we should pay more attention to. I would be interested in a similar thing using different rainfall data steps. With that in mind, what rainfall data sets were you using for these?

COBrogan commented 2 weeks ago

FYI working on multi gage model run but issues with sftp call in sbatch:

gage_coverage_file="usgsgageList.csv"
gage_coverage_SQLfile=usgsgageList.sql
gageSQL="
\\set fname '/tmp/${gage_coverage_file}' \n

copy ( select hydrocode
FROM dh_feature
WHERE bundle = 'watershed' AND ftype = 'usgs_full_drainage'
) to :'fname';"
# turn off the expansion of the asterisk
set -f
echo -e $gageSQL > $gage_coverage_SQLfile 
cat $gage_coverage_SQLfile | psql -h dbase2 -d drupal.dh03
sftp dbase2:"/tmp/${gage_coverage_file}" "/home/cobrogan/${gage_coverage_file}"

filename="/home/cobrogan/${gage_coverage_file}"
for i in `cat $filename`; do
    echo $i
done

filename="/home/cobrogan/${gage_coverage_file}"
for i in `cat $filename`; do
  echo "Running: sbatch /opt/model/meta_model/run_model raster_met stormVol_prism \"$i\" auto geo"
  sbatch /opt/model/meta_model/run_model raster_met stormVol_prism "$i" auto geo 
done

rburghol commented 2 weeks ago

@COBrogan One thing to check is that you have done ssh–copy-id to dbase2 (like we did yesterday for deq1), otherwise I don't know if it will allow you to do that file retrieval automatically.

If that doesn't fix it for you, paste your error, message up here and I'll take a look .

ilonah22 commented 2 weeks ago

@rburghol All of this is using PRISM, how would I use a different source? Just changing stormVol_prism to stormVol_daymet?

COBrogan commented 2 weeks ago

@ilonah22 Yep. Theoretically you can call stormVol_daymet and stormVol_nldas2 as the scenario. I haven't tested these yet so it would be interesting to see if they work.

COBrogan commented 2 weeks ago

@COBrogan One thing to check is that you have done ssh–copy-id to dbase2 (like we did yesterday for deq1), otherwise I don't know if it will allow you to do that file retrieval automatically.

If that doesn't fix it for you, paste your error, message up here and I'll take a look .

That makes sense, but I don't have the permissions to create a home directory for myself in dbase2 (nor could I sudo to do it). Would you mind setting that up Rob? Brendan said he could set one up in dbase1, but I don't know if that's ideal since our configs all point to 2

COBrogan commented 2 weeks ago

I've kicked off a run of all our gage for this method. Got 156 gages so far. Some months have negative adjusted R^2, which is interesting. I think the adjustment has to do with sample size, so could make sense for a really poor fit with small n (e.g. perhaps a gage that only has a few years of data during our precip data source time period 1980 - present). Anyways, here's a sneak peak:

Same image, but now the R^2 axis is limited between 0 to 1

ilonah22 commented 1 week ago

After Tuesday 8/13, it looks like 188 gages ran successfully for the prism stormsep workflow. Here's a plot of the density of the mean r-squared values for each gage and of the density of all of the r-squared values.

Rplot

R-squaredDensity_total

COBrogan commented 1 week ago

Thanks @ilonah22 . I'm working on debugging the remaining gages. Seems like minor stuff here and there. At least one gage was failing because there was an entire month of 0 cfs flow and hreg was failing on it. Fixed that, but still more bugs to track down. Thanks for the plots!

ilonah22 commented 1 week ago

@COBrogan, that sounds good, I really quickly found the gages with the highest and lowest mean r-squared to plot.

Here is the highest, with no issues I can see just from looking at the plot:

MonthlyRHighest

Here is the lowest (non-zero), where you can see that there are only 4 r-squared monthly values above 0 and multiple NA values:

MonthlyRLowest LowestDataTable

If this happening to one gage without breaking, that probably means it is happening to others.

durelles commented 1 week ago

Morning, Great work. Wondering what the R2 metric vs. Drainage area relationship looks like.

COBrogan commented 6 days ago

Thanks @ilonah22. What were the gage IDs for these two examples? And were you able to look at any of the residuals for the gage that had those high R^2 values? It might give us some good insight into how the regression is performing with large + small events across the months. It is interesting that there aren't many seasonal impacts at this particular gage.

As for the missing R^2 values, I believe this is intentional. If there is insufficient data to conduct the regression, the algorithm will just return NA for that month. This could happen if the Storm Separation routine found 1 - 3 storms for that month (most likely due to a small gage record). We can look into it further, but the negative R^2 values also indicate that sample size is likely low. We can probably confirm this by looking at the residual plots for February (best performing month) and July (worst performing month)`

rburghol commented 6 days ago

@COBrogan this looks really cool to me. I think that it is probably time for us to get together a good data model for storing this data on VAHydro, so that we can easily do some analysis of summaries statistics over thewhole data set.

COBrogan commented 6 days ago

I think at this point all gages have been run with a few exceptions:

usgs_ws_03208800. Recorded no data during the precip record of 1980 - Present and thus could not be run
usgs_ws_02079500. Recorded no data during the precip record of 1980 - Present and thus could not be run
usgs_ws_01636210. Recorded no data during the precip record of 1980 - Present and thus could not be run
usgs_ws_01645784. Too small to run with PRISM under current methodology due to its 0.79 square mile DA
usgs_ws_02050000 did not download, likely due to the high amounts of provisional data at this tidal gage
usgs_ws_01658500 and 02037000 (Richmond Canal) something wrong in stormAnalysis, resulted in 18 warnings but process finished

rburghol commented 5 days ago

@COBrogan - great summary, FYI I am currently doing a variation on calc_raster_ts.sh to handle resampling and then a further variation to handle resampling plus temporal dissagregation. I want to only have a single routine, but I feel like the complexity of of differences between the 3 queries warrants it. Currently tracking this in #72 though the complexity of the op probably warrants its own issue.

ilonah22 commented 20 hours ago

@COBrogan , The highest R-squared gage was 01652500 and the lowest non-zero was 01671025.

COBrogan commented 14 hours ago

@ilonah22 I've kicked off a daymet batch run. We'll see tomorrow how many run successfully, but I definitely see the ratings files being generated so it's working for at least most of the gages!

HARPgroup / model_meteorology

GEO_MET_MODEL option 3: Storm Hydrograph Analysis and Precip/Hydrograph Volume Analysis #60