HARPgroup / HARParchive

This repo houses HARP code development items, resources, and intermediate work products.
1 stars 0 forks source link

REST / RomDataSource, metrics and analysis #1354

Open rburghol opened 2 months ago

rburghol commented 2 months ago

Analysis Steps

Overview

Data Model & REST

Data Model Outline

Detail View of REST Feature Query
feature <- RomFeature$new(
  ds,
  list(
    hydrocode=coverage_hydrocode,
    ftype=coverage_ftype,
    bundle=coverage_bundle
  ),
  TRUE
)

Using om_vahydro_metric_grid to retrieve data


# GET VAHydro 1.0 RIVERSEG l90_Qout DATA
mdf <- data.frame(
  'model_version' = c('cbp-6.1'),
  'runid' = c('stormVol_prism'),
  'metric' = c('precip_annual_max_in'),
  'runlabel' = c('precip_annual_max_in')
)
met_data <- om_vahydro_metric_grid(
  metric = metric, runids = mdf, bundle="watershed", ftype="usgs_full_drainage",
  base_url = paste(site,'entity-model-prop-level-export',sep="/"),
  ds = ds
)
nathanielf22 commented 1 month ago

Here is the current function that I updated a bit tonight. This works for any PRISM, daymet, and NLDAS2 data, such as: read.csv("http://deq1.bse.vt.edu:81/met/stormVol_prism/precip/usgs_ws_01613900-PRISM-all.csv")

When calculating a metric, you can run something like this to return the lowest 90-day flow: summary_analytics(prism)$l90_precip_in

`library(tidyr) library(sqldf) library(zoo)

For ANY data

summary_analytics <- function(df){ df <- separate (data = df, col = obs_date, into = c("obs_date","obs_time"), sep = " ") df <- separate (data = df, col = obs_date, into = c("obs_year","obs_month","obs_day"), sep = "-")

creating a yearly summary with each year and its total precip

yearly.summary <- sqldf( "SELECT obs_year, SUM(precip_in) AS total_precip FROM df GROUP BY obs_year" )

summary analytics

precip_annual_max_in <- max(yearly.summary$total_precip)

precip_annual_max_year <- yearly.summary$obs_year[which.max(yearly.summary$total_precip)]

precip_annual_mean_in <- mean(yearly.summary$total_precip)

For min values and years, we can exclude the first and last row since the

current year and first years are incomplete data

precip_annual_min_in <- min(yearly.summary$total_precip[c(-nrow(yearly.summary),-1)]) precip_annual_min_year <- yearly.summary$obs_year[which.min (yearly.summary$total_precip[c(-nrow (yearly.summary),-1)])]

Create daily summary to use for all data. This makes hourly data daily sums.

daily.summary <- sqldf( "SELECT obs_year, obs_month, obs_day, SUM(precip_in) AS total_precip FROM df GROUP BY obs_year, obs_month, obs_day" ) precip_daily_max_in <- max(daily.summary$total_precip)

if else statement evaluates the amount of unique hours and if 24,

then hourly max is taken. If not, hourly max is NA

if(length(unique(df$hr)) == 24){ precip_hourly_max_in <- max(df$precip_in) } else { precip_hourly_max_in <- NA }

Alternatively to a null value for hourly precip in daily data,

we could look at the rainfall distribution table for a

type II storm and multiply by the max P(t)/P(24) value.

this function uses NULL for daily data

l90 using zoo package

l90_precip_in <- min(rollapply(daily.summary$total_precip, width = 90, FUN = mean, fill = NA, align = "right"), na.rm = TRUE)

Qout_zoo <- zoo(as.numeric(df$precip_in), order.by = df$tstime)

Qout_g2 <- data.frame(group2(Qout_zoo))

l90 done using zoo package. IHA did not work.

makes data frame with all 8 metrics

metrics<- data.frame(precip_annual_max_in = precip_annual_max_in, precip_annual_max_year = precip_annual_max_year, precip_annual_mean_in = precip_annual_mean_in, precip_annual_min_in = precip_annual_min_in, precip_annual_min_year = precip_annual_min_year, precip_daily_max_in = precip_daily_max_in, precip_hourly_max_in = precip_hourly_max_in, l90_precip_in = l90_precip_in) }`

I can run through this in detail next week since I know this is a lot to look at. I'll be out of town tomorrow afternoon through Sunday and likely out of service, but I will do my best to respond before I leave or after I get back this weekend. In the meantime, I wanted to make sure this is stored here.

rburghol commented 1 month ago

@nathanielf22 hey Nate, thanks a bunch for pushing this up before you headed out. Very much look forward to working with this, and congratulations on the progress.

nathanielf22 commented 1 month ago

10/25 meeting:

rburghol commented 1 month ago

@nathanielf22 if you can post up the input variables that created that strange output, I will check on my machine. But, from my recollection, I think that we used the wrong coverage_ftype, and in my testing with another coverage (details below), I got correct results. The ftype should be usgs_full_drainage for those USGS gage watershed features.


scenario_name <- 'daymet2'
coverage_hydrocode <- 'usgs_ws_01673550'
coverage_bundle <- 'watershed'
coverage_ftype <- 'usgs_full_drainage'
model_version <- 'cbp-6.1'
met_file <- 'http://deq1.bse.vt.edu:81/met/daymet/precip/usgs_ws_01673550-daymet-all.csv'
rburghol commented 1 month ago

@nathanielf22 -- I have one observation about the script as is. I tested it in full, and it works! The only thing you should update is the variable name summary should be changed to something else. The reason being, there is an existing R function called summary(), and when you assign the output of summary_analytics() to summary that function gets overwritten, and that is a fairly important function that is used a good deal. precip_summary might be a good option :).

nathanielf22 commented 1 week ago

@rburghol @COBrogan -- The code we put together last Friday to integrate into the workflow has an issue with the daymet and NLDAS data and should be updated. when including the n>=365days limitation, daymet excludes certain years where it only has 364 days as shown below: image image

In addition, NLDAS has an issue because yearly summary creation happens before daily summaries, which I switched around. This data is for gage usgs_ws_01613900.

COBrogan commented 1 week ago

@nathanielf22 The daymet dataset will always have 365 days in each year. They always exclude December 31st on leap years (but include February 29th, which is an unusual set-up). But, remember that daymet was based in the GMT time zone. That means that each day in EST runs from 7AM to 7AM the next day. So, when grouping data by year, it matters if you are grouping by tstime or tsendtime.

In this case, we can trace through our routines and see that we are grouping data by tsendtime (a confusing by-product of how PRISM operates and an assumption we made about daymet based on a daymet-PRISM regression). That means that leap years will always have 366 data points and the year after will have 364. To better demonstrate this, we are essentially beginning our year at 7 AM January 2nd and ending it on 7AM Dec 31. So, since a daymet leap year does not have Dec 31st, it must have 364 days.

My suggestion for now is to include years >= 364 days but let's discuss at our next meeting!

For reference, note that daymet has 365 days each year when grouping by tstime:

select count(ts.rast),extract(year from to_timestamp(ts.tstime))
from dh_timeseries_weather as ts
left join dh_variabledefinition var on ts.varid = var.hydroid
where var.varkey = 'daymet_mod_daily'
group by extract(year from to_timestamp(ts.tstime));

 count | extract
-------+---------
   365 |    1980
   365 |    1981
   365 |    1982
   365 |    1983
   365 |    1984
   365 |    1985
   365 |    1986
   365 |    1987
   365 |    1988
   365 |    1989
   365 |    1990
   365 |    1991