Dashboard needs to account for forecaster prediction missingness

brookslogan commented 6 months ago

Currently, there are some surprises in comparisons like

by ahead you see dev10v4s lower WIS than amoebalike
by forecast_date it looks like amoebalike is almost always lower WIS than dev10v4s.

This is probably explained by amoebalike being generated for fewer, mostly lower, aheads right now and slightly fewer times (though somehow it has ~twice as many predictions than dev10v4s when you do x var = forecaster?)

There are a couple of approaches

Intersecting to common prediction set for the set of forecasters selected.
Forecaster-pool-relative WIS approaches.

Usually I think we'd favor the first unless some weird missingness patterns or high levels of missingness force us to do something like the second.

There's a bit of old code for this; evalcast::intersect_averagers() did this one way, and some old code did it another way; this is the core:

matched_scorecards <- scorecards %>%
  filter(!is.na(ae)) %>%
  group_by(data_source, signal, geo_value, forecast_date, target_end_date, ahead, incidence_period) %>%
  filter(n() == length(unique(.[["forecaster"]]))) %>%
  ungroup()

There are also variations on this that tried to also simultaneously filter to forecast dates or target end dates that had evaluations for all the aheads like the below, though it's pretty confusing and there's probably a better way to write it. The idea was that for the most recent target dates, we may only have evaluations ready for the shorter aheads, and that this would suggest misleading forecasting "trends" when breaking down by target end date but not simultaneously the ahead.

   group_by(data_source, signal, geo_value, target_end_date, incidence_period) %>%
   {
     n.forecasters <- length(unique(.[["forecaster"]]))
     filter(
       .,
       n() == n.forecasters * length(matching_aheads[matching_aheads %% 7L == extract_single_unique_value(ahead %% 7L)])
     )
   } %>%
   ungroup() %>%

This code looks a little weird because forecast_dates were expected to be exactly weekly but aheads from 0 or 1 to 28, so for target dates with the same weekday as a forecast date you'd want 5 or 4 predictions per forecaster, and for other target dates you'd want 4 predictions per forecaster. (For complete forecast_dates you'd want 29 or 28 per forecaster.)

dsweber2 commented 6 months ago

though somehow it has ~twice as many predictions than dev10v4s when you do x var = forecaster?)

This is because covid_hosp_explore generates every day of the week, rather than just one, so for any given ahead there's ~7x the number of points. I'm considering dropping this down to only Mondays. @dshemetov thoughts? It does make examining weekday effects more difficult, but those are surprisingly uncommon here. It would mean the 21 hour runs become a mere 3 hours.

Intersecting to common prediction set for the set of forecasters selected.

In the mean time, you can filter by ahead to only the matching ones. Having that auto-populate to the minimal shared wouldn't be a bad idea.

dshemetov commented 6 months ago

Just producing weekly forecasts sounds good to me. I'm not particularly concerned about weekday effects atm and a shorter run time is nice to have.

cmu-delphi / exploration-tooling

Dashboard needs to account for forecaster prediction missingness #101