Closed aappling-usgs closed 6 years ago
I've now (in upcoming PR) reworked the aggregation so that the aggregated retro reports means for 1am to 11:59pm on each date; aggregated medium-range reports means for 3am to 11:59pm; aggregated long-range reports means for 6am to 11:59pm. This gives us a similar approach across datasets (albeit with different temporal resolutions and therefore start times) and produces 10 days of medium-range forecasts (lead time 0 days through 9 days) and 30 days of long-range forecasts (lead time 0 days through 29 days).
("lead time 0 days" means, for a long-range example, the forecasts generated at 12am on a reference date and describing flow at 6am, 12pm, 6pm, 12am=11:59pm on that reference date.)
Here are some plots to affirm that the results still make sense:
# medium range, lag 0
nl <- readRDS('2_munge/out/agg_nwis.rds')
al <- readRDS('2_munge/out/agg_nwm_med.rds')
ggplot(filter(nl$flow, date > as.Date('2017-02-01')), aes(x=date, y=daily_mean)) + geom_line(color='blue') + facet_grid(site_no ~ ., scale='free_y') + geom_point(data=dplyr::filter(al,
ref_date==as.Date(valid_date-as.difftime(0, units='days'))), aes(x=valid_date, y=flow, group=ref_date), size=0.5) + ylab(expression(Discharge~(m^3~s^-1))) + xlab(expression(Date)) + theme_classic()
# medium range, lag 9
nl <- readRDS('2_munge/out/agg_nwis.rds')
al <- readRDS('2_munge/out/agg_nwm_med.rds')
ggplot(filter(nl$flow, date > as.Date('2017-02-01')), aes(x=date, y=daily_mean)) + geom_line(color='blue') + facet_grid(site_no ~ ., scale='free_y') + geom_point(data=dplyr::filter(al,
ref_date==as.Date(valid_date-as.difftime(9, units='days'))), aes(x=valid_date, y=flow, group=ref_date), size=0.5) + ylab(expression(Discharge~(m^3~s^-1))) + xlab(expression(Date)) + theme_classic()
# long range, lag 0
nl <- readRDS('2_munge/out/agg_nwis.rds')
al <- readRDS('2_munge/out/agg_nwm_long1.rds')
ggplot(filter(nl$flow, date > as.Date('2017-02-01')), aes(x=date, y=daily_mean)) + geom_line(color='blue') + facet_grid(site_no ~ ., scale='free_y') + geom_point(data=dplyr::filter(al,
ref_date==as.Date(valid_date-as.difftime(0, units='days'))), aes(x=valid_date, y=flow, group=ref_date), size=0.5) + ylab(expression(Discharge~(m^3~s^-1))) + xlab(expression(Date)) + theme_classic()
# long range, lag 29
nl <- readRDS('2_munge/out/agg_nwis.rds')
al <- readRDS('2_munge/out/agg_nwm_long1.rds')
ggplot(filter(nl$flow, date > as.Date('2017-02-01')), aes(x=date, y=daily_mean)) + geom_line(color='blue') + facet_grid(site_no ~ ., scale='free_y') + geom_point(data=dplyr::filter(al,
ref_date==as.Date(valid_date-as.difftime(29, units='days'))), aes(x=valid_date, y=flow, group=ref_date), size=0.5) + ylab(expression(Discharge~(m^3~s^-1))) + xlab(expression(Date)) + theme_classic()
For now, evaluating the results of the temporary fix (#46, #42).
With the PR I'm about to submit to convert cfs to cms for the NWIS data, the two data sources for discharge look very similar now. Here are the lead-time-0 predictions from the long-range member 1 (black dots) compared to NWIS data (blue line):
I've confirmed that members 2-4 are very similar.
And for fun, here's a comparison of NWIS obs (blue line again) to medium-range predictions at lead time = 10 (black dots):
Also believable, and give a sense of how much the lead time affects predictions (quite noticeably, but not so much that 10-day lead times are worthless).
The one sorta funny thing is that you can have a lead time of 0 through 10 - that's 11 days per forecast - isn't that one too many?