Open elray1 opened 7 months ago
Interesting, I couldn't remember which section came 4th and was surprised when I saw that this was the Bayes act solution part which to me has felt 1. the most focussed because its purpose is to just to capture some version of each step of the optimization math involved in the project into one self-contained package, and 2. essentially (and mercifully) done...
Maybe you could say more about what seems problematic?
I was reading on the wrong branch -- the section that I was hoping we could talk through was the section titled "Properties and Properness". Sorry for the confusion.
The goal would be to make a precise claim about the properness of AS, and AS via distfromq.
The alloscore is proper if distribution functions F are handed to us. Is it still proper given our algorithm situation?
do quantiles elicited by \texttt{distfromq} $\hookrightarrow$ \texttt{alloscore} process align with real quantiles?
It's valuable to (start with?) include the discussion of what a proper score is.
Decision maker: takes action x, scored rel to y Forecaster: takes action theta (= 23 "quantiles"),
Forecaster's loss: $l_f(\theta, y) = l_d( x(\theta), y)$, where $l_d$ is $s_A$ in the notation of main text section 2.2.2
From here, we could ask:
If the forecaster is told what the function $x: \Theta \rightarrow \mathcal{X}$ is, then the machinery of proper scores applies, and the forecaster's scoring rule is proper.
However, the elicited $\theta$ may not be quantiles. the bayes act associated with forecast F is x^F. The forecaster will submit whatever $\theta$ is required to get $x(\theta) = x^F$.
ELR TODO
We should be able to demonstrate in a simple example that $\theta$ are not the quantiles of $F$ at the 23 specified probability levels. E.g., take $F$ to be something that's not a normal distribution and use normal tails in distfromq::make_q_fn
. We could actually do this in a setting inspired by example 1:
We decided not to do anything with the below:
Can we get any sense of how much of a bad idea our post hoc analysis is?
A few thoughts, maybe repetitive, about where we are in sec:distfromq_alloscore_proper_prospective
Start with a DP $(\mathcal{X},\mathcal{Y}, l)$. Any map $M:\mathcal{W} \to \mathcal{X}$ gives a new DP $(\mathcal{W},\mathcal{Y}, l{M})$ where $l{M}(\theta, y):=l(M(\theta),y)$.
We now have scoring rules $S{\mathcal{X},l}(F,y) := l(x^F,y)$ and $S{\mathcal{W},l_M}(F,y) := lM(\theta^F,y) = l(M(\theta^F),y)$
Both SR's are Bayes and therefore proper by construction. But they will generally differ
since we have placed no restrictions yet on how $M(\theta^F)$ might relate to $x^F$. For us,
$M(\theta^F):= \mathrm{argmin}_{x} E{\theta(F)}[l(x,Y)]$ which will generally only be $x^F = \mathrm{argmin}_{x} E_{F}[l(x,Y)]$ when $F=F^{\theta(F)}$, that is, when $F$ is in the distfromq
family.
Copying in a comment that Aaron made on slack:
My primary concern right now is that before we can deal with the notational problems we need to resolve a conceptual problem that begins with the sentence "This raises the question of whether the allocation score is still proper if the forecast distribution $F$ is not itself directly recorded." The problem, which is what the github comment is contending with, is that what we have after adding distfromq optimizations to the forecasting task is no longer, strictly speaking, the allocation score..."
What are the important things to say about this? Here are some thoughts:
Clarification attempt: As a numeric vector $\theta(F)$ is just the hub-plevel quantiles of $F$. But as a parameter $\theta(F)$ refers to the element of $\mathcal{W}$ (the distfromq
family) with the same quantiles as $F$.
$F^{\theta}$ on the other hand (maybe unwisely) refers to the element of $\mathcal{W}$ with quantile parameter vector $\theta$. Then $F=F^{\theta(F)} \implies F \in \mathcal{W}$.
Realizing now I should have used $F_{\theta}$
So here's where I'm on the latest bullet points:
distfromq
...r.e. second point:
Suppose I fix $K = 15,000$ and identify the allocation $x^F$ for my forecast distribution $F$. Then I find a $G{\theta*} \in \mathcal{P}\{dfq}$ such that for some probability level $\tau$, for each location $i$ $x^Fi = G{\theta}^{-1}(\tau)$, so that $x^{G_{\theta}} = x^F$. Then for any $y$, $s_A(x^F, y) = sA(x^{G{\theta*}}, y)$ since the two allocations are the same.
Turning $Y$ into a random variable and taking expectations wrt $F$, $$E_F[ s_A(x^F, Y) ] = E_F[ sA(x^{G{\theta*}}, Y) ]. (Eq. 1)$$
But $x^F = argmin_x E_F [ s_A(x, Y) ]$ by definition. So combining with (1), for any other $\theta$ $$E_F[s_R(\theta, Y)] = E_F[ sA(x^{G{\theta}}, Y) ] = E_F[s_A(x^F, Y)] \leq E_F[ sA(x^{G\theta}, Y) ] = E_F[ s_R(\theta, Y)].$$ So $\theta*$ is a Bayes act for the reporting problem under the distribution $F$.
Therefore, $S_R(F, y) = s_R(\theta, y) = sA(x^{G{\theta}}, y) = s_A(x^F, y) = S_A(F, y)$
more directly responding to your second point, I don't think it's the case that $S_R(F, y) = sA(x^{G{\theta(F)}}, y)$, because it will generally be optimal to report some parameters $\theta$ that are not just the quantiles of $F$.
Noting that I'm leaving this issue open pending a review and additional comments on this material from @aaronger
I recognize that it is work in progress, but currently section 4 of the supplement feels unfocused and it seems like it might be starting to wander from the core mission of our paper. Can we use this issue to settle on the main messages that we want to communicate in this section of the supplement?