Open COBrogan opened 1 month ago
Here is the path to the file I was working on today which finds max and min r-squared ranges on the data-analysis branch:
HARP-2024-2025/DrainageAreaAnalysis.R
Thanks @ilonah22 ! @nathanielf22 here is a useful GIS in R reference I've used in the past: https://r-spatial.github.io/sf/articles/sf1.html
Leafing through it, these tutorials seem helpful. The most relevant R code will be those that use the sf
library as many other R spatial libraries are slotted for deprecation (like rgdal
which used to be very popular)
https://tmieno2.github.io/R-as-GIS-for-Economists/vector-basics.html
https://bookdown.org/michael_bcalles/gis-crash-course-in-r/
There are several chapters in there to go over using sf to do basic GIS stuff in R. Otherwise, here's some extra code to help get some spatial plots going. It's pretty long and it's assuming you have the ratings files downloaded (easiest way is sftp
, which Ilona has used before or you can reach out to me and I can get you going). From there, you need the ratings in their own folders all in a director called ratings
. Let me know if you have any questions!
The above code should generate the following image, which is showing October adjusted R squared values for daymet from the Storm Volume method:
Thank you @COBrogan! I have been attempting to use the code, but having some issues with the creation of gageCompare, where it has all the dplyr steps. The ratings appear to be repeating, and it leads to the creation of list-cols that make the data frame unusable. Now, I'm trying to sftp the ratings again to be sure I have the right data, but I'm struggling to find the correct folder on the server to sftp from. Could you help point me in the right direction to get the data that works for this R script?
@nathanielf22 -- no need for sftp, as all data should be available via web link with pattern: http://deq1.bse.vt.edu:81/met/[scenario]/
For example you can find the precip data and flow analyses for things that were run for scenario stormVol_nldas2
(storm volume analysis based on nkdas2 data) at - http://deq1.bse.vt.edu:81/met/stormVol_nldas2/
@rburghol, that makes sense. What about the simple lm data? The code includes those on the third line, but there isn't a folder labelled simplelm.
The simplelm data is in the folders labeled NLDAS, PRISM, and daymet. Those directories should be renamed eventually, but we haven’t done so yet. So, in ‘PRISM/stats/‘ should be the simplelm results. @rburghol i think ‘sftp’ is easier, unless there’s a way to read all those csv urls at once into R? So far, I’ve been telling everyone to just use sftp if they are analyzing ALL 1,080 csv files… If you agree, @nathanielf22 this data is in /media/model/met/ but you may need to use ‘ls’ to poke around the directories within there which is why the urls would be easier if there’s a way to get all of them. So, ‘/media/model/met/PRISM/out’ has the simple on stats
@COBrogan I think that if we are at the point that we need to analyze thousands of files at the same time we're at the need to prioritize putting the analysis inside the workflow and setting summary metrics in the database.
Building local desktop workflows to analyze thousands of files that are downloaded via STP on the server we risk time wasted and trouble created with redundant analysis and download and then data sets coming out of date.
I think that we really should be focusing on small case study analysis right now rather than broadbrush global things. When we have analytical lenses that we think are worthy of repeating over 1000 files then we move forward.
As for the scenario names, for sure scenario nldas2
is actually that which has the nldas2 + simple lm
data -- totally agree that we DO need to remedy this, and this will be a good thing to prioritize. It involves only creating a new .con
file as a copy of nldas2.con
. Then rerunning the workflows.
Interestingly on the server we have the .con files as well: http://deq1.bse.vt.edu:81/p6/vadeq/config/control/met/
Hmm. I agree that these metrics should be shifted into the db soon. I think we discussed making a data model sketch for that next week, but we talked about Nate beginning to create those workflows to analyze/QC precip data based on annual means, potential timestamp issues, and spatial visualization of precip annual data. This work will naturally become a workflow as it’s based on the precip files we’re generating, so no issues there. I believe the spatial visualization of mean precip for a year is a valuable tool. To test a workflow locally, you’d want to grab all existing coverage precip files. So, I think this work is still valuable as long as we keep in mind that precip file path or directory should be an input to the script. If we keep to one script = one purpose, then Nate will naturally develop scripts that create mean precip which will be a mandatory step in putting it in the db. Maybe we can chat a bit about this tomorrow so I can catch up on our intended structure, but I think Nate’s on the right track here to creating a dataset QC workflow that may become part of geo/process and geo/analyze
@COBrogan I 100% agree that we will eventually want to see global trends in all sorts of variables, but the QA workflow step is at the single coverage level, not over the global data sets. I think I may have been too vague in my comments on this previously. I will elaborate below. @nathanielf22
In our workflows, everything operates at a coverage level, so in developing the QA steps we should:
Otherwise, we spend time doing file management (sftp), and we write a code workflow that iterates through files rather than a standalone workflow to handle analyzing a single file, for a single coverage. Then we have to disentangle our batch code to operate standalone.
But we already have a robust set of code to allow us to retrieve all the metrics from all the scenarios at one single time in a very efficient manner om_vahydro_metric_grid
, and we have a workflow and file structure to handle iterating over 1,000s of combos, we just have to get ourselves prepared to use it. And preparing to use it is exactly what is done by the single coverage/data set workflow components outlined above. And once we have them, we will insert them via REST, and then map to our hearts content.
Working on rerunning workflows simple_lm
and stormVol
to incorporate changes to JSON and calc_raster_ts
. The following workflows need to be rerun:
ALL other methods are out of date!
dh_timeseries_weather
calc_raster_ts
dh_timeseries_weather
with DEQ supportsimple_lm
#https://github.com/HARPgroup/model_meteorology/issues/57simple_lm
that develops monthly regressions and applies them on a weekly basis. This will involve regressions based on four weekly data points applied on a weekly basis. Issue created, but needs reworking #https://github.com/HARPgroup/model_meteorology/issues/59amalgamate
stormVol
#https://github.com/HARPgroup/model_meteorology/issues/60amalgamate
stormVol
that applies divides events into 8 year periods and runs regressions based on months. The end result would be a timeseries that has constant months during each 8 year periodstormVol
configs to run the stormVol approach comparing volume above baseflow to precipitation during the storm event to have a more deterministic approach[coverage]-[met_scenario]-lm_simple-rating-ts.csv
file to include tstime and tsendtime - covering periods, rather than an entry for each day.stormVol
?om_vahydro_metric_grid
. Can get started by pullingL30
from models and begin to test spatial visualization while we process our results and eventually store them via REST. Can pull scripts from last year's projectdh_timeseries_weather
ordh_property
via REST https://github.com/HARPgroup/HARParchive/issues/1354om_vahydro_metric_grid
to recall data fromdh_property
lm(PRISM ~ NLDAS2)
. We can leverage thebaseline
variables set-up in the*.con
files to easily integrate these into our existing workflowslm(PRISM$precip[PRISM$time + 1] ~ NLDAS2)
.con
files include variableBASELINE_MET_SCENARIO
to provide this info