HARPgroup / HARParchive

This repo houses HARP code development items, resources, and intermediate work products.
1 stars 0 forks source link

Coding Workflow Components: Weeks of 7/1/2024 and 7/8/2024 #1291

Open mwdunlap2004 opened 2 weeks ago

mwdunlap2004 commented 2 weeks ago
mwdunlap2004 commented 2 weeks ago

I pushed my function that takes in a dataset and gageid variable and outputs a weekly csv of that data, it is on the weeklyprecip branch, and it is called "attemptatweekdata".

mwdunlap2004 commented 2 weeks ago

Right now the function uses basic variable names like precip_in, do any of you know of a way to change the name of a variable using the dataset so it would look like PRISM_p_cfs for example.

COBrogan commented 2 weeks ago

Right now the function uses basic variable names like precip_in, do any of you know of a way to change the name of a variable using the dataset so it would look like PRISM_p_cfs for example.

Sure, there are several ways to create a dynamically named field in a dataframe or as an entry to a list. The simplest is likely the following, which will create a column of NAs named MyCol2 in a data.frame:

inVar <- 2
myDF[,paste0("My","Col",inVar)] <- NA

You could alternatively simply add a column and rename it based on the index of said column:

myDF$dummyColumn <- NA
names(myDF)[ names(myDF) == "dummyColumn" ] <- paste0("MyCol", inVar)
#OR
myDF$dummyColumn <- NA
names(myDF) [ grepl("dummyColumn", names(myDF)) ] <- paste0("MyCol", inVar)

However, I'm not sure we'd want more specifically named columns unless these are all being joined together. If the function is only handling one dataset at a time, it might be helpful to keep the structure of the output file generic such that we always get a data frame with the same names. This makes it easier to handle the data frame in future processing steps, regardless of the data source. In other words, it might be helpful to get a field labeled precip_in as long as it represents only one datasource. Then, we know we can use this function and always simply precip_in to get the precip data from the datasource that we specify earlier in the workflow.

ilonah22 commented 1 week ago

I created a new version of access-file.R, called lmsingledata, which only requires one dataset, so all analysis can be run on one data source at a time. The major change was how data is pulled in.



hydrocode = paste0('usgs_ws_', gageid)

data_source = "prism"

hydro_data <- read.csv(paste0("http://deq1.bse.vt.edu:81/files/met/", 
                              hydrocode,"-",data_source, "-all.csv"))

hydro_data[,c('yr', 'mo', 'da', 'wk')] <- cbind(year(as.Date(hydro_data$obs_date)),
                                                month(as.Date(hydro_data$obs_date)),
                                                day(as.Date(hydro_data$obs_date)),
                                                week(as.Date(hydro_data$obs_date)))

if (data_source=="nldas2"){
hydro_data <- sqldf(
  "select featureid, min(obs_date) as obs_date, yr, mo, da, 
     sum(precip_mm) as precip_mm, sum(precip_in) as precip_in
   from hydro_data 
   group by yr, mo, da
   order by yr, mo, da
  "
)}
rburghol commented 6 days ago

@ilonah22 the code that you pasted above looks excellent -- what it does is to create a daily summary dataframe from the raw data file. The next step is to create a second script that does almost the same thing but takes the daily CSV as input and generates a weekly CSV (which we use for some of our methods).

The only mods I would put for the script, is that rather than guessing the hydrocode and input file name (and output filename), these will be inputs to the script. The details of this script are in the issue I tagged you in over here: https://github.com/HARPgroup/model_meteorology/issues/61 -- if you can start to develop and track your progress on this over there that would be awesome. Keep me posted - thanks!

COBrogan commented 5 days ago

@ilonah22 I think Rob's comments are spot-on. Taking this framework you have and creating a weekly version is a great next step and will help to reinforce our workflow development. I'd be happy to help out with this as needed. I have some availability in the afternoon and can help parse through Rob's suggestion or go over some next steps. I found this workflow process to be a bit tricky at first and am happy to discuss! Just let me know and I can set-up a Teams Meeting.