ankargren / mfbvar

R package for Mixed-Frequency Bayesian VARs
https://ankargren.github.io/mfbvar
38 stars 20 forks source link

disconnected forecast #6

Closed sjones29 closed 5 years ago

sjones29 commented 5 years ago

Hi, thanks for a great package!

I'm getting some puzzling behavior when looking at the output from plot.mfbvar_ss. The forecast doesn't pick up where the in-sample data left off (what I am calling a disconnected forecast), but for just one of the variables. For that variable, the output from the predict.mfbvar method confirms that the point at which the forecast begins is not equal to the observed data at that point.

I've attached a portion of the plot output below -- the disconnected forecast is on the far right. The other two charts seem fine; for those variables, the point where the forecast begins is the observed in-sample data.

ss_plot

The variable with the disconnected forecast uses quarterly data, while the other variables use monthly data. I initially thought it might be a ragged-edges issue. However, the in-sample data ends with no missing data, where the quarter and the month align to both have observed data, so I'm still surprised to see the predict.mfbvar method returning a starting forecast point that is not equal to the observed quarterly data point.

I have tried this with different ending rows for the in-sample data, and I get the same result -- only the quarterly variable's forecast seems to disconnect from the observed value. I haven't been able to find anything wrong with the parameter inputs for estimate_mfbvar, any of my inputs to specify the steady-state prior, or any of my indexing.

Thank you for the help!

ankargren commented 5 years ago

Thanks for the report, that does indeed look odd. I haven’t seen this myself, so would you mind sharing your code and some data (just something that replicates it, can be simulated)? I’ll look into it ASAP.

sjones29 commented 5 years ago

Here's an example that approximates my code and reproduces the error on my end. Thanks again for your help--much appreciated!

# install.packages("mfbvar")
library("mfbvar")
#> Warning: package 'mfbvar' was built under R version 3.5.3

## generating data

set.seed(143)    

monthly1 <- sample(500)
monthly2 <- sample(500)
quarterly <- sample(500)

quarterly[setdiff(1:500,seq(1, 500, 3))] <- NA

total_sample <- data.frame(monthly1, monthly2, quarterly)

in_sample <- total_sample[1:400, ]

## setting initial prior object

in_sample_prior_obj <- set_prior(Y = in_sample, freq = c(rep("m", 2), "q"),
                                 n_lags = 10, n_burnin = 1000, n_reps = 1000)
#> Warning: prior_Pi_AR1: 0 used as prior mean for AR(1) coefficients.
#> Warning: lambda1: 0.2 used as the value for the overall tightness hyperparameter.
#> Warning: lambda2: 1 used as the value for the lag decay hyperparameter.
#> Warning: lambda3: 10000 used for the constant's prior variance.
#> Warning: n_fcst: 0 used for the number of forecasts to compute.

## for each variable, defining steady-state bounds belief as middle 95% of last 40 non-NA observations

variables_without_NAs <- lapply(seq_len(ncol(in_sample)), function(x){na.omit(as.matrix(in_sample[, x]))})
last_forty_nonNA_obs <- lapply(variables_without_NAs, function(x){x[((nrow(x)-40):nrow(x)), ]})
ss_bounds <- lapply(last_forty_nonNA_obs, function(x){as.list(quantile(x, probs = c(0.025, 0.975)))})
ss_bounds_reformat <- lapply(ss_bounds, function(x){as.matrix(do.call(cbind, x))})
prior_intervals <- matrix(do.call(rbind, ss_bounds_reformat), ncol = 2, byrow = FALSE)

## updating prior and generating mfbvar forecast

psi_moments <- interval_to_moments(prior_intervals)
in_sample_prior_obj_mean <- psi_moments$prior_psi_mean
in_sample_prior_obj_omega <- psi_moments$prior_psi_Omega

in_sample_prior_obj <- update_prior(prior_obj = in_sample_prior_obj, d = "intercept", 
                                    prior_psi_mean = in_sample_prior_obj_mean, 
                                    prior_psi_Omega = in_sample_prior_obj_omega)

model_ss <- estimate_mfbvar(mfbvar_prior = in_sample_prior_obj, prior_type = "ss", n_fcst = 2)

# plot() and predict() output 393.2299 for the observed quarterly observation,
# but it should be 354, as shown when calling tail():
tail(in_sample, n = 10)
#>     monthly1 monthly2 quarterly
#> 391      131      396       489
#> 392      476      280        NA
#> 393      491      468        NA
#> 394       87      334       135
#> 395      284       97        NA
#> 396       86      458        NA
#> 397      466      143       328
#> 398      275       18        NA
#> 399       83       60        NA
#> 400      149      442       354

predict(model_ss, pred_quantiles = 0.5)
#> $quantile_50
#>        monthly1 monthly2 quarterly
#> 400    149.0000  442.000  393.2299
#> fcst_1 259.4457  276.852  213.7933
#> fcst_2 248.9418  250.177  199.3944

plot(model_ss)


# > sessionInfo()
#
# R version 3.5.0 (2018-04-23)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1
# 
# Matrix products: default
# 
# locale:
#   [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
# [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
# [5] LC_TIME=English_United States.1252    
# 
# attached base packages:
#   [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
#   [1] mfbvar_0.4.0
# 
# loaded via a namespace (and not attached):
#   [1] Rcpp_0.12.17     bindr_0.1.1      magrittr_1.5     tidyselect_0.2.4 munsell_0.5.0   
# [6] colorspace_1.3-2 R6_2.2.2         rlang_0.2.1      pbapply_1.3-4    plyr_1.8.4      
# [11] dplyr_0.7.5      tools_3.5.0      parallel_3.5.0   grid_3.5.0       gtable_0.2.0    
# [16] yaml_2.1.19      lazyeval_0.2.1   assertthat_0.2.0 digest_0.6.15    tibble_1.4.2    
# [21] bindrcpp_0.2.2   purrr_0.2.5      ggplot2_3.1.0    glue_1.2.0       labeling_0.3    
# [26] compiler_3.5.0   pillar_1.2.3     scales_1.0.0     pkgconfig_2.0.1 

Created on 2019-04-16 by the reprex package (v0.2.1)

ankargren commented 5 years ago

Great reprex, that was very helpful. The problem is that when you use predict(model_ss) you get the forecasts of the underlying monthly variable. The way the model works is that we assume that there is a latent monthly variable that is governing the quarterly observations we get. The forecasts you obtain from the model are for this underlying, monthly variable. For a fixed quarterly value, the underlying monthly values vary from iteration to iteration.

If you do median(model_ss$Z_fcst[10, 3, ]) you get 393 (just as from predict), but if you do colMeans(model_ss$Z_fcst[8:10, 3, ]) you see that the average of the monthly values within the final quarter is fixed to 354. So the forecast you get in predict is correct, it's just that the plotting method should anchor the forecasts at the final quarterly observation. The plot that you get therefore is a little bit special, because it shows quarterly observations, but monthly forecasts. I am working on a new version of the package where this is handled a little bit better (so that also the forecasts are presented on the quarterly frequency).

For now, I have updated the package, so if you do

devtools::install_github("ankargren/mfbvar")

you should see that the plot method returns a plot with no discontinuity. Let me know if you try it and have no further issues with it, and I'll update the version on CRAN too.

sjones29 commented 5 years ago

I updated on my end and tried again with both the simulated example and my actual unshared code, and it plotted correctly for both -- thanks!

Many thanks as well for the detailed answer. I was not aware of the assumption that there is a latent monthly process underlying the quarterly observations. To clarify, should I be taking 3-month averages of monthly forecasts to be the forecasted quarterly data at the end of the 3-month period? Or should I be taking an individual monthly forecast to be the forecasted quarterly data for the quarter ending on that month?

ankargren commented 5 years ago

Good to hear!

Yes, you should take averages over the months within each quarter. The reason why the current version of the package does not do that is that I didn't want to assume too much about the structure of the input (e.g. dates etc), but I've come to a change in opinion with regards to that. The new version will be a little stricter, but on the flipside that means it will compute the quarterly forecasts.

To get things right, if you want the quarterly forecast in the simulated example you provided you would need to do: quantile(colMeans(model_ss$Z_fcst[11:13, 3, ]), probs = 0.5) to get the median forecast of the quarterly variable. (That would also require you to use n_fcst = 3 or higher.) Your input here is very valuable here actually, it makes it clear for me what I need to clarify in the package. I'll make sure this is better documented in the new version. (If you have further questions, feel free to send me an e-mail.)