Coding Workflow Components: Weeks of 7/1/2024 and 7/8/2024

mwdunlap2004 commented 4 months ago

[x] Adjust our current methods to work with singular datasets
- [ ] Meta Model Steps: https://github.com/HARPgroup/model_meteorology/issues/61
[x] Pull in data for more gages
[x] Look at issue with nldas2 and getting no data
[x] HARPgroup/model_meteorology#61
[x] Update met CSV files for USGS gage coverages to use new naming convention:
- Use: http://deq1.bse.vt.edu:81/met/[data source]/out/
- Ex 1: http://deq1.bse.vt.edu:81/met/daymet/out/
- Ex 2: http://deq1.bse.vt.edu:81/met/PRISM/out/
- Ex 3: http://deq1.bse.vt.edu:81/met/nldas2/out/
[x] A function that accepts a single dataset and creates the plots we want (or a csv of that data?)
- [x] Function draft exists, and should take a weekly (or daily, or hourly probably) dataset as input and create an lm.
  - mon_lm function is in repo now at: https://raw.githubusercontent.com/HARPgroup/HARParchive/master/HARP-2024-2025/functions/lm_analysis_plots.R
  - Demo script has been updated to show its use: https://github.com/HARPgroup/HARParchive/blob/master/HARP-2024-2025/methods/simple_monthly_lm.R
  - Use is simple, requiring 2 lines of code:
    - library("R6")
    - source("https://raw.githubusercontent.com/HARPgroup/HARParchive/master/HARP-2024-2025/functions/lm_analysis_plots.R")
    - @ilonah22 @COBrogan @mwdunlap2004 check the above out, it now provides access to the function mon_lm
- [x] Turn function into a script

mwdunlap2004 commented 4 months ago

I pushed my function that takes in a dataset and gageid variable and outputs a weekly csv of that data, it is on the weeklyprecip branch, and it is called "attemptatweekdata".

mwdunlap2004 commented 4 months ago

Right now the function uses basic variable names like precip_in, do any of you know of a way to change the name of a variable using the dataset so it would look like PRISM_p_cfs for example.

COBrogan commented 4 months ago

Right now the function uses basic variable names like precip_in, do any of you know of a way to change the name of a variable using the dataset so it would look like PRISM_p_cfs for example.

Sure, there are several ways to create a dynamically named field in a dataframe or as an entry to a list. The simplest is likely the following, which will create a column of NAs named MyCol2 in a data.frame:

inVar <- 2
myDF[,paste0("My","Col",inVar)] <- NA

You could alternatively simply add a column and rename it based on the index of said column:

myDF$dummyColumn <- NA
names(myDF)[ names(myDF) == "dummyColumn" ] <- paste0("MyCol", inVar)
#OR
myDF$dummyColumn <- NA
names(myDF) [ grepl("dummyColumn", names(myDF)) ] <- paste0("MyCol", inVar)

However, I'm not sure we'd want more specifically named columns unless these are all being joined together. If the function is only handling one dataset at a time, it might be helpful to keep the structure of the output file generic such that we always get a data frame with the same names. This makes it easier to handle the data frame in future processing steps, regardless of the data source. In other words, it might be helpful to get a field labeled precip_in as long as it represents only one datasource. Then, we know we can use this function and always simply precip_in to get the precip data from the datasource that we specify earlier in the workflow.

ilonah22 commented 4 months ago

I created a new version of access-file.R, called lmsingledata, which only requires one dataset, so all analysis can be run on one data source at a time. The major change was how data is pulled in.



hydrocode = paste0('usgs_ws_', gageid)

data_source = "prism"

hydro_data <- read.csv(paste0("http://deq1.bse.vt.edu:81/files/met/", 
                              hydrocode,"-",data_source, "-all.csv"))

hydro_data[,c('yr', 'mo', 'da', 'wk')] <- cbind(year(as.Date(hydro_data$obs_date)),
                                                month(as.Date(hydro_data$obs_date)),
                                                day(as.Date(hydro_data$obs_date)),
                                                week(as.Date(hydro_data$obs_date)))

if (data_source=="nldas2"){
hydro_data <- sqldf(
  "select featureid, min(obs_date) as obs_date, yr, mo, da, 
     sum(precip_mm) as precip_mm, sum(precip_in) as precip_in
   from hydro_data 
   group by yr, mo, da
   order by yr, mo, da
  "
)}

rburghol commented 4 months ago

@ilonah22 the code that you pasted above looks excellent -- what it does is to create a daily summary dataframe from the raw data file. The next step is to create a second script that does almost the same thing but takes the daily CSV as input and generates a weekly CSV (which we use for some of our methods).

The only mods I would put for the script, is that rather than guessing the hydrocode and input file name (and output filename), these will be inputs to the script. The details of this script are in the issue I tagged you in over here: https://github.com/HARPgroup/model_meteorology/issues/61 -- if you can start to develop and track your progress on this over there that would be awesome. Keep me posted - thanks!

COBrogan commented 4 months ago

@ilonah22 I think Rob's comments are spot-on. Taking this framework you have and creating a weekly version is a great next step and will help to reinforce our workflow development. I'd be happy to help out with this as needed. I have some availability in the afternoon and can help parse through Rob's suggestion or go over some next steps. I found this workflow process to be a bit tricky at first and am happy to discuss! Just let me know and I can set-up a Teams Meeting.

HARPgroup / HARParchive

Coding Workflow Components: Weeks of 7/1/2024 and 7/8/2024 #1291