Corrections Not Applied to Ensemble?

cvanbommel commented 2 months ago

I can see in the code where the correction is applied in order to calculate the weights. However, when the ensemble forecast is calculated based on these weights, I believe the data is read back in from the teams' csv files, and no correction is applied.

benjaminbenteke commented 2 months ago

Do you mean get_teams_data function? and data_kw_ahead_opt function. If so,

The correction is used in this get_teams_data since the train happens here. How, data_kw_ahead_opt is used to calculate ensemble forecast . When blending data we can't apply correction since truth data not available.

benjaminbenteke commented 2 months ago

Or you mean we have to correct both, each team and ensemble.

cvanbommel commented 2 months ago

When blending data, we should be using the historical correction factor based on the truth data that is available prior to blending, so that both the training data and the forecasts to be blended have been corrected.

So, for a forecast being made in reference week $w$ at horizon $h$, we have truth values for forecasts $\hat{y}^h_{m, e}$, where $h + 1 \le e \le w - 1$. In addition to correcting these past values, we should correct the teams' forecasts made in reference week $w$ before blending as follows:

$$ f^h{addit, m}(w + h) = f^h{m}(w + h) + \sum_{i = h + 1}^{w - 1} \pi^{w - i} L_i(m) $$

$$ f^h{multip, m}(w + h) = \left[ \prod{i = h + 1}^{w - 1} Ri(m)^{\pi^{w-i}} \right] f^h{m}(w + h)$$

Otherwise, we are using weights and forecasts based on different models, which is why the corrected versions (as currently coded) are coming out worse than the original model.

benjaminbenteke commented 2 months ago

Got it. So we have to save the correction factor of each team from the training so it can be used to create ensemble. This means we correct both, forecast (from past) and h-wk densities.

benjaminbenteke commented 2 months ago

If so, do you think that the ensemble will remain a probability density?

cvanbommel commented 2 months ago

For the 2023-2024 season, we're working with quantiles, right? So we should be able to correct the value corresponding to each quantile without issue. (We interpret the underlying probability distribution from those values if desired.)

benjaminbenteke commented 2 months ago

I think the expanding window $e$ is supposed to be: $1 \leq e \leq w-1$, i. e., it doesn't depend on $h$$.

If $e= 10$, by picking $w= 5$ and $h= 1$, then $i \in [2,3,4] $. So, we missed week 1 data.

cvanbommel commented 2 months ago

The start of the 2023-2024 was reference date 2023-10-14, let's assume for convenience that this is week 1.

Then the first week for which there is a forecast at horizon h = 1 is 2023-10-21, that is, week 2.

For a forecast submitted in week 5 at horizon 1 (reference date 2023-11-11, target date 2023-11-18), the following forecasts and their truth data are available: Ref date: 2023-10-14, target date: 2023-10-21 (a forecast of week 2) Ref date: 2023-10-21, target date: 2023-10-28 (a forecast of week 3) Ref date: 2023-10-28, target date: 2023-11-04 (a forecast of week 4)

There is no earlier forecast for the target date 2023-10-14 (week 1), so we do not use the truth data for week 1 at when h = 1. On the other hand, we do not use the forecast made for reference week 2023-11-04 (week 4) because the target date of 2023-11-11 (week 5) has not yet passed.

We can only use 3 of the previous 4 weeks for a forecast at horizon 1. The size of the expanding window must be dependent on h, my convention is to refer to the target date so it is easy to compare with the date of the truth data.

benjaminbenteke commented 2 months ago

Our EWO started from previous season.

cvanbommel commented 2 months ago

The previous season ran from reference date 2022-01-10 to 2023-05-15. There is no forecast with reference date 2023-10-07 predicting the number of cases for the week ending 2023-10-14 (h = 1).

benjaminbenteke commented 2 months ago

I see. Do you think this will affect the optimization of EWO without correct?

cvanbommel commented 2 months ago

The code appears to be treating the last two years as one large season. If that is the intention, I believe the code is working as intended, but I think there is confusion over how it is described in the paper that we need to reach a consensus on.

benjaminbenteke / EWO_Methods

Corrections Not Applied to Ensemble? #7