cmu-delphi / exploration-tooling

tools for evaluating and exploring forecasters
Other
0 stars 0 forks source link

Have a "best of the pack" forecaster auto-selected in the shiny app #49

Open nmdefries opened 10 months ago

nmdefries commented 10 months ago

Have both a baseline #48 and a well-performing forecaster auto-selected on app startup to augment model evaluation.

David Weber how feasible do you think it is to have whichever [forecaster] with the best overall WIS plotted? as a kind of "other end" comparison

Nat DeFries to find the forecaster with the best overall wis, we'd need to calculate average (or other aggregation) wis for each forecaster which means reading in all the scores. This would be too slow to do in the dashboard, so it would probably need to be another target. And then I guess we'd cache just the best forecaster name, like you did with [the] forecasters [object]

calculating aggregate wis in a comparable way may be an issue, though. the first thing that comes to mind is that forecasters may not have scores available for all of the same dates/geos. and what aggregation method do we want to use to calculate (e.g. straight mean, population-weighted mean, etc). or maybe we only want a starting point for comparison and a forecaster that performs relatively well [and isn't the absolute best] is sufficient.

if we're calculating avg wis for every forecaster, maybe it would be useful to report their relative rankings [as an additional way to analyze performance]

David Weber oh the # of days thing is a good point hadn't thought about that. Yeah, population weighted + average per day would probably be closer to what we want, but definitely ambiguous. Might not be worth it if we'd have to read in every score to do the comparison any time any new forecaster score is created. [On the other hand] it probably would be useful to have a report of the relative rankings

Nat DeFries we should be able to avoid [reading in every score any time a new forecaster is added] by adding a target for each forecaster that calculates and saves the aggregate wis. A separate target would do the ranking. It would only need to read the aggregate scores, not the full scores

[RE population weighted + average per day] so then the best would be something like the forecaster with the best wis on the most # of days (only considering shared days)?

or we could just have a one-off script do the ranking [although that doesn't fit in with the targets framework we're using]

David Weber oh, yeah, adding an averaging step for each forecaster/score would work. unsure if that would be best as a dynamic or static job