aaronger / utility-eval-papers

Pipelines and drafts related to arXiv:2312.16201
0 stars 0 forks source link

add discussion paragraph about the appropriate reference for forecast value #44

Closed elray1 closed 9 months ago

elray1 commented 10 months ago

Our current analyses use an Oracle adjustment. Another possibility we could have calculated is value relative to a baseline forecast. Ultimately, what we really care about is whether a forecast adds value to a decision-making process, where a public health decision maker has some other sources of information, e.g. analytics and expert judgment based on understanding from past experiences. In other words, we really care about whether the forecast adds useful information to an existing decision-making process. Our current scoring procedures do not get at this.

See also item (c) in this quote from Murphy (1993) image

aaronger commented 10 months ago

I tend to think of the oracle adjustment as separate from the skill/value vs a baseline $bl$ issue. That is, it is just (maybe) giving us an easier way to calculate Murphy's value score:

$\frac{E[S(bl,Y)] - E[S(F,Y)]}{E[S(bl,Y)] - E[S(Y,Y)]} = \frac{E^S_{bl} -E^SF}{E^S{bl} - E^S_{oracle}} = 1-\frac{E^rF}{E^r{bl}}$

where the $r$ refers to a "regret" random variable $S_r(F,Y) := S(F,Y) - S(Y,Y)$.

elray1 commented 10 months ago

yeah, i agree that it plays a different role...

aaronger commented 10 months ago

Actually I should have said $S_r(F,Y) = S(F,Y) - l(Y,Y)$... Not sure how to express an oracle forecast, $\delta_Y$? This is actually where a lot of decision theory becomes an inscrutable morass, with measure-valued measures and what not...

elray1 commented 10 months ago

re-stating for emphasis, the main content of the paragraph I think we need to add to the discussion is that we really care about whether the forecast adds useful information to an existing decision-making process, and our current scoring procedures do not get at this. This would tell us the "value added" of the forecast for a public health decision maker.

aaronger commented 10 months ago

To that point, I think both/either a pop-weighted or a persistence allocation would be good (baseline) baselines to suggest in this discussion.