Open jsadler2 opened 2 years ago
I took a crack at this with the code below:
inst_data <- targets::tar_read(p1_inst_data)
grouped_inst <- inst_data %>%
mutate(dateTime_local = lubridate::with_tz(dateTime,tzone="America/New_York"),
Date = lubridate::date(dateTime_local)) %>%
group_by(Date, site_no, agency_cd, Parameter)
means_and_val_counts <- grouped_inst %>%
summarise(Value = mean(Value_Inst, na.rm=TRUE),
na_count=sum(is.na(Value_Inst)),
value_count=sum(!is.na(Value_Inst)),
.groups="keep") %>%
mutate(percent_coverage=value_count/(value_count + na_count))
maxs_and_max_times <- grouped_inst %>%
slice_max(order_by = Value_Inst) %>%
slice_head(n=1) %>%
rename(Value_Max = Value_Inst, max_date_time = dateTime_local) %>%
select(max_date_time, Value_Max)
mins_and_min_times <- grouped_inst %>%
slice_min(order_by = Value_Inst) %>%
slice_head(n=1) %>%
rename(Value_Min = Value_Inst, min_date_time = dateTime_local) %>%
select(min_date_time, Value_Min)
combined <- means_and_val_counts %>%
left_join(maxs_and_max_times) %>%
left_join(mins_and_min_times) %>%
filter(percent_coverage > 0.5) %>%
select(-c(na_count, value_count, percent_coverage))
# this takes a LONG time!!
combined <- combined %>%
mutate(max_time = format(lubridate::ymd_hms(max_date_time), "%H:%M:%S"),
min_time = format(lubridate::ymd_hms(min_date_time), "%H:%M:%S"))
This works, well at least to point where I was trying to extract the "times" from the "max/min_date_times" (the last operation). That took a really long time, it was running for like 5 minutes and I killed it. So that will need to be addressed.
Also, there may be a much cleaner way of calculating all of these. I separated out the max/min calculations because I wasn't sure if you can pass two variables (Value_inst
and dateTime_local
) into a summarize
function and also I didn't know if you could have to return values (Max
and Max_time
).
In #26, @jds485 started a conversation about the timing of daily maxs and mins, and there was agreement that it would be good to calculate these.