elray1 commented 7 months ago

I recognize that it is work in progress, but currently section 4 of the supplement feels unfocused and it seems like it might be starting to wander from the core mission of our paper. Can we use this issue to settle on the main messages that we want to communicate in this section of the supplement?

aaronger commented 7 months ago

Interesting, I couldn't remember which section came 4th and was surprised when I saw that this was the Bayes act solution part which to me has felt 1. the most focussed because its purpose is to just to capture some version of each step of the optimization math involved in the project into one self-contained package, and 2. essentially (and mercifully) done...

Maybe you could say more about what seems problematic?

elray1 commented 7 months ago

I was reading on the wrong branch -- the section that I was hoping we could talk through was the section titled "Properties and Properness". Sorry for the confusion.

elray1 commented 7 months ago

Goals

The goal would be to make a precise claim about the properness of AS, and AS via distfromq.

The alloscore is proper if distribution functions F are handed to us. Is it still proper given our algorithm situation?

do quantiles elicited by \texttt{distfromq} $\hookrightarrow$ \texttt{alloscore} process align with real quantiles?

What we need to say to achieve this?

Review of proper scores in more depth

It's valuable to (start with?) include the discussion of what a proper score is.

Set up for the forecaster's problem

Decision maker: takes action x, scored rel to y Forecaster: takes action theta (= 23 "quantiles"),

v1: (ideally) with the knowledge that decision maker will use distfromq to map theta -> \tilde{F}, which is what will be used to set the action x, scored rel to y.
v2: distfromq + alloscore takes theta to an action x, which is scored rel to y.

Forecaster's loss: $l_f(\theta, y) = l_d( x(\theta), y)$, where $l_d$ is $s_A$ in the notation of main text section 2.2.2

Note 1: the whole procedure still yields a proper score (if the score to be used is specified prospectively)

From here, we could ask:

what is the forecaster's Bayes act, $\theta^{F,K}$?
The forecaster's scoring rule is then $S_f(F, y) = l_f(\theta^{F,K}, y) = l_d( x(\theta^{F,K}), y)$

If the forecaster is told what the function $x: \Theta \rightarrow \mathcal{X}$ is, then the machinery of proper scores applies, and the forecaster's scoring rule is proper.

Note 2: but the elicited quantity may not be a real quantile

However, the elicited $\theta$ may not be quantiles. the bayes act associated with forecast F is x^F. The forecaster will submit whatever $\theta$ is required to get $x(\theta) = x^F$.

ELR TODO We should be able to demonstrate in a simple example that $\theta$ are not the quantiles of $F$ at the 23 specified probability levels. E.g., take $F$ to be something that's not a normal distribution and use normal tails in distfromq::make_q_fn. We could actually do this in a setting inspired by example 1:

Forecasts are $F_1 = exp(1/1)$, $F_2 = Exp(1/4)$, $K = 5$. Suppose a hub collects forecasts in a quantile format at probability levels $0.25, 0.5, 0.75$, and states that distfromq will be used with normal tails. What three-number summaries of $F_1$ and $F_2$ should. the forecaster submit to get the computed allocations to be 1 and 4? Check that these are not exactly the quantiles of $F_1$ and $F_2$ at the specified probability levels.

Note 3: in a post hoc evaluation setting, it may not be proper.

We decided not to do anything with the below:

Can we get any sense of how much of a bad idea our post hoc analysis is?

formal bounds?
some kind of informal investigation?
- Go back to all historical weeks that were used in our application.
  - Generate 1000000000000000000 samples from the baseline
  - Get scores based on allocations that come from 23 quantiles of these distributions:
    - for each location, extract the 23 quantiles that would have been a forecast hub submission
    - use distfromq + alloscore to get scores
  - Get scores based on allocations from true forecast distribution:
    - get an empirical cdf of those (for each location).
    - Feed those empirical cdfs into alloscore to find the allocation that comes from the baseline's full forecast distribution (at K = 15,000)
    - This step is not actually necessary: Use the $\theta$-optimization procedure above to find the value of $\theta$ to submit for the baseline to get its allocation vector to match the one found in the previous step
    - calculate the resulting allocation scores.
  - Check how different the allocations and/or allocation scores based on submitted quantiles are from the allocation scores based on the full distribution.

aaronger commented 7 months ago

A few thoughts, maybe repetitive, about where we are in sec:distfromq_alloscore_proper_prospective Start with a DP $(\mathcal{X},\mathcal{Y}, l)$. Any map $M:\mathcal{W} \to \mathcal{X}$ gives a new DP $(\mathcal{W},\mathcal{Y}, l{M})$ where $l{M}(\theta, y):=l(M(\theta),y)$. We now have scoring rules $S{\mathcal{X},l}(F,y) := l(x^F,y)$ and $S{\mathcal{W},l_M}(F,y) := lM(\theta^F,y) = l(M(\theta^F),y)$ Both SR's are Bayes and therefore proper by construction. But they will generally differ since we have placed no restrictions yet on how $M(\theta^F)$ might relate to $x^F$. For us, $M(\theta^F):= \mathrm{argmin}_{x} E{\theta(F)}[l(x,Y)]$ which will generally only be $x^F = \mathrm{argmin}_{x} E_{F}[l(x,Y)]$ when $F=F^{\theta(F)}$, that is, when $F$ is in the distfromq family.

elray1 commented 7 months ago

Copying in a comment that Aaron made on slack:

My primary concern right now is that before we can deal with the notational problems we need to resolve a conceptual problem that begins with the sentence "This raises the question of whether the allocation score is still proper if the forecast distribution $F$ is not itself directly recorded." The problem, which is what the github comment is contending with, is that what we have after adding distfromq optimizations to the forecasting task is no longer, strictly speaking, the allocation score..."

What are the important things to say about this? Here are some thoughts:

In general the score $S_R$ is not formally equivalent to the allocation score.
For a given (set of) resource constraint(s) K, if it is possible to find a member of the parametric family with the same allocation as F for all possible forecast distributions F, dfq+alloscore is equivalent to alloscore in the sense that all forecasts will achieve exactly the same score
For a given forecast F, the scores will only be equivalent for all K if F is in the specified parametric family.
Again, it may be simpler to just record the allocations.

aaronger commented 7 months ago

Clarification attempt: As a numeric vector $\theta(F)$ is just the hub-plevel quantiles of $F$. But as a parameter $\theta(F)$ refers to the element of $\mathcal{W}$ (the distfromq family) with the same quantiles as $F$. $F^{\theta}$ on the other hand (maybe unwisely) refers to the element of $\mathcal{W}$ with quantile parameter vector $\theta$. Then $F=F^{\theta(F)} \implies F \in \mathcal{W}$.

Realizing now I should have used $F_{\theta}$

aaronger commented 7 months ago

So here's where I'm on the latest bullet points:

$S_R$ will not only be formally but also numerically inequivalent to $S_A$ for the forecaster with $F \notin \mathcal{W}$
I do not think this is true: $F \notin \mathcal{W}$ and $G \in \mathcal{W}$ could have $x^F=x^G$ without $\theta(F)$ being at all close to $G$, in which case $S_R(F,y) = s_A(x^{\theta(F)}, y) \neq s_A(x^G,y) = S_A(F,y)$.
This I do believe.
It would be vastly simpler, but at the same time forecasters might not really have any better way to plug their $F$ into any optimization algorithm than to use something like distfromq...

elray1 commented 7 months ago

r.e. second point:

Suppose I fix $K = 15,000$ and identify the allocation $x^F$ for my forecast distribution $F$. Then I find a $G{\theta*} \in \mathcal{P}\{dfq}$ such that for some probability level $\tau$, for each location $i$ $x^Fi = G{\theta}^{-1}(\tau)$, so that $x^{G_{\theta}} = x^F$. Then for any $y$, $s_A(x^F, y) = sA(x^{G{\theta*}}, y)$ since the two allocations are the same.

Turning $Y$ into a random variable and taking expectations wrt $F$, $$E_F[ s_A(x^F, Y) ] = E_F[ sA(x^{G{\theta*}}, Y) ]. (Eq. 1)$$

But $x^F = argmin_x E_F [ s_A(x, Y) ]$ by definition. So combining with (1), for any other $\theta$ $$E_F[s_R(\theta, Y)] = E_F[ sA(x^{G{\theta}}, Y) ] = E_F[s_A(x^F, Y)] \leq E_F[ sA(x^{G\theta}, Y) ] = E_F[ s_R(\theta, Y)].$$ So $\theta*$ is a Bayes act for the reporting problem under the distribution $F$.

Therefore, $S_R(F, y) = s_R(\theta, y) = sA(x^{G{\theta}}, y) = s_A(x^F, y) = S_A(F, y)$

elray1 commented 7 months ago

more directly responding to your second point, I don't think it's the case that $S_R(F, y) = sA(x^{G{\theta(F)}}, y)$, because it will generally be optimal to report some parameters $\theta$ that are not just the quantiles of $F$.

elray1 commented 7 months ago

Noting that I'm leaving this issue open pending a review and additional comments on this material from @aaronger

aaronger / utility-eval-papers

supplement: discuss planned content for section 6 #67

Goals

What we need to say to achieve this?

Review of proper scores in more depth

Set up for the forecaster's problem

Note 1: the whole procedure still yields a proper score (if the score to be used is specified prospectively)

Note 2: but the elicited quantity may not be a real quantile

Note 3: in a post hoc evaluation setting, it may not be proper.