USGS-R / drb-inland-salinity-ml

Code repo for Delaware River Basin machine learning models that predict inland salinity.
Creative Commons Zero v1.0 Universal
3 stars 4 forks source link

Calculate timing of maxs/mins #34

Open jsadler2 opened 2 years ago

jsadler2 commented 2 years ago

In #26, @jds485 started a conversation about the timing of daily maxs and mins, and there was agreement that it would be good to calculate these.

jsadler2 commented 2 years ago

I took a crack at this with the code below:

inst_data <- targets::tar_read(p1_inst_data)

grouped_inst <- inst_data %>%
  mutate(dateTime_local = lubridate::with_tz(dateTime,tzone="America/New_York"),
         Date = lubridate::date(dateTime_local)) %>%
  group_by(Date, site_no, agency_cd, Parameter)

means_and_val_counts <- grouped_inst %>%
  summarise(Value = mean(Value_Inst, na.rm=TRUE), 
            na_count=sum(is.na(Value_Inst)), 
            value_count=sum(!is.na(Value_Inst)),
            .groups="keep") %>%
  mutate(percent_coverage=value_count/(value_count + na_count))

maxs_and_max_times <- grouped_inst %>%
  slice_max(order_by = Value_Inst) %>%
  slice_head(n=1) %>%
  rename(Value_Max = Value_Inst, max_date_time = dateTime_local) %>%
  select(max_date_time, Value_Max)

mins_and_min_times <- grouped_inst %>%
  slice_min(order_by = Value_Inst) %>%
  slice_head(n=1) %>%
  rename(Value_Min = Value_Inst, min_date_time = dateTime_local) %>%
  select(min_date_time, Value_Min)

combined <- means_and_val_counts %>% 
  left_join(maxs_and_max_times) %>%
  left_join(mins_and_min_times) %>%
  filter(percent_coverage > 0.5) %>%
  select(-c(na_count, value_count, percent_coverage))

 # this takes a LONG time!!                          
combined <- combined %>%
  mutate(max_time = format(lubridate::ymd_hms(max_date_time), "%H:%M:%S"),
         min_time = format(lubridate::ymd_hms(min_date_time), "%H:%M:%S"))
jsadler2 commented 2 years ago

This works, well at least to point where I was trying to extract the "times" from the "max/min_date_times" (the last operation). That took a really long time, it was running for like 5 minutes and I killed it. So that will need to be addressed.

Also, there may be a much cleaner way of calculating all of these. I separated out the max/min calculations because I wasn't sure if you can pass two variables (Value_inst and dateTime_local) into a summarize function and also I didn't know if you could have to return values (Max and Max_time).

jds485 commented 2 years ago

Adding Jeff's comment from #26

Another thing that comes up here is the Value_Max_cd. I think the "A" job is to get the Value_Inst_cd at the max daily value for this.