european-central-bank / BEAR-toolbox

The Bayesian Estimation, Analysis and Regression toolbox (BEAR) is a comprehensive (Bayesian Panel) VAR toolbox for forecasting and policy analysis.
98 stars 38 forks source link

Calcualtion of continuous rank probability score (CRPS) #26

Open borisblagov opened 2 years ago

borisblagov commented 2 years ago

Hi, I don't see an option for discussions here so therefore I am opening an issue.

I am puzzled at the calculation of the continuous rank probability score (CRPS) across the complete BEAR toolbox. It doesn't seem to use actual data. Here is an example from the panelfeval.m

% now compute the continuous ranked probability score

% create the cell storing the results
CRPS=cell(n,1);
% loop over endogenous variables
for ii=1:n
   % loop over forecast periods on which actual data is known
   for jj=1:Fcperiods    
   % compute the continuous ranked probability score
   score=crps(forecast_record{ii,1}(:,jj),forecast_estimates{ii,1}(2,jj));
   CRPS{ii,1}(1,jj)=score;
   end
end

The crps function itself is

function [score]=crps(ygibbs,yo)

% compute first the dimension of the gibbs sampler draws (normally, 'It' iterations minus 'Bu' discarded burn iterations)
It_Bu=size(ygibbs,1);

% compute the first summation term
sum1=sum(abs(ygibbs-repmat(yo,It_Bu,1)));

% compute the second summation term
 temp=abs(repmat(ygibbs,1,It_Bu)-repmat(ygibbs',It_Bu,1));
 sum2=sum(temp(:));

% eventually compute the score
score=(1/It_Bu)*sum1-(1/(2*It_Bu^2))*sum2;

My understanding is that to evaluate the CRPS we use the real data (actual realization) and calculate the probability that that actual realization came from our forecast distribution (ygibbs). However, in the panelfeval.m, as well as all the other *feval functions that I looked in, the crps function is called with

crps(forecast_record{ii,1}(:,jj),forecast_estimates{ii,1}(2,jj));

where forecast_estimates{ii,1}(2,jj) is simply the median of the forecast (hence the (2,:) call).

Shouldn't we be calling crps function with the real data as argument? This is also what the manual suggests if I understand it correctly (this is from 2016 wp)

image

I believe the $y^{\circ}_{T+h}$ is the actual data.

That would make the function

for ii=1:n
   % loop over forecast periods on which actual data is known
   for jj=1:Fcperiods    
   % compute the continuous ranked probability score
   % score=crps(forecast_record{ii,1}(:,jj),forecast_estimates{ii,1}(2,jj));
   score=crps(forecast_record{ii,1}(:,jj),data_endo_c(jj,ii));
   CRPS{ii,1}(1,jj)=score;
   end
end

This goes for all feval functions. Thanks!