Closed sjones29 closed 5 years ago
Thanks for the report, that does indeed look odd. I haven’t seen this myself, so would you mind sharing your code and some data (just something that replicates it, can be simulated)? I’ll look into it ASAP.
Here's an example that approximates my code and reproduces the error on my end. Thanks again for your help--much appreciated!
# install.packages("mfbvar")
library("mfbvar")
#> Warning: package 'mfbvar' was built under R version 3.5.3
## generating data
set.seed(143)
monthly1 <- sample(500)
monthly2 <- sample(500)
quarterly <- sample(500)
quarterly[setdiff(1:500,seq(1, 500, 3))] <- NA
total_sample <- data.frame(monthly1, monthly2, quarterly)
in_sample <- total_sample[1:400, ]
## setting initial prior object
in_sample_prior_obj <- set_prior(Y = in_sample, freq = c(rep("m", 2), "q"),
n_lags = 10, n_burnin = 1000, n_reps = 1000)
#> Warning: prior_Pi_AR1: 0 used as prior mean for AR(1) coefficients.
#> Warning: lambda1: 0.2 used as the value for the overall tightness hyperparameter.
#> Warning: lambda2: 1 used as the value for the lag decay hyperparameter.
#> Warning: lambda3: 10000 used for the constant's prior variance.
#> Warning: n_fcst: 0 used for the number of forecasts to compute.
## for each variable, defining steady-state bounds belief as middle 95% of last 40 non-NA observations
variables_without_NAs <- lapply(seq_len(ncol(in_sample)), function(x){na.omit(as.matrix(in_sample[, x]))})
last_forty_nonNA_obs <- lapply(variables_without_NAs, function(x){x[((nrow(x)-40):nrow(x)), ]})
ss_bounds <- lapply(last_forty_nonNA_obs, function(x){as.list(quantile(x, probs = c(0.025, 0.975)))})
ss_bounds_reformat <- lapply(ss_bounds, function(x){as.matrix(do.call(cbind, x))})
prior_intervals <- matrix(do.call(rbind, ss_bounds_reformat), ncol = 2, byrow = FALSE)
## updating prior and generating mfbvar forecast
psi_moments <- interval_to_moments(prior_intervals)
in_sample_prior_obj_mean <- psi_moments$prior_psi_mean
in_sample_prior_obj_omega <- psi_moments$prior_psi_Omega
in_sample_prior_obj <- update_prior(prior_obj = in_sample_prior_obj, d = "intercept",
prior_psi_mean = in_sample_prior_obj_mean,
prior_psi_Omega = in_sample_prior_obj_omega)
model_ss <- estimate_mfbvar(mfbvar_prior = in_sample_prior_obj, prior_type = "ss", n_fcst = 2)
# plot() and predict() output 393.2299 for the observed quarterly observation,
# but it should be 354, as shown when calling tail():
tail(in_sample, n = 10)
#> monthly1 monthly2 quarterly
#> 391 131 396 489
#> 392 476 280 NA
#> 393 491 468 NA
#> 394 87 334 135
#> 395 284 97 NA
#> 396 86 458 NA
#> 397 466 143 328
#> 398 275 18 NA
#> 399 83 60 NA
#> 400 149 442 354
predict(model_ss, pred_quantiles = 0.5)
#> $quantile_50
#> monthly1 monthly2 quarterly
#> 400 149.0000 442.000 393.2299
#> fcst_1 259.4457 276.852 213.7933
#> fcst_2 248.9418 250.177 199.3944
plot(model_ss)
# > sessionInfo()
#
# R version 3.5.0 (2018-04-23)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1
#
# Matrix products: default
#
# locale:
# [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
# [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
# [5] LC_TIME=English_United States.1252
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] mfbvar_0.4.0
#
# loaded via a namespace (and not attached):
# [1] Rcpp_0.12.17 bindr_0.1.1 magrittr_1.5 tidyselect_0.2.4 munsell_0.5.0
# [6] colorspace_1.3-2 R6_2.2.2 rlang_0.2.1 pbapply_1.3-4 plyr_1.8.4
# [11] dplyr_0.7.5 tools_3.5.0 parallel_3.5.0 grid_3.5.0 gtable_0.2.0
# [16] yaml_2.1.19 lazyeval_0.2.1 assertthat_0.2.0 digest_0.6.15 tibble_1.4.2
# [21] bindrcpp_0.2.2 purrr_0.2.5 ggplot2_3.1.0 glue_1.2.0 labeling_0.3
# [26] compiler_3.5.0 pillar_1.2.3 scales_1.0.0 pkgconfig_2.0.1
Created on 2019-04-16 by the reprex package (v0.2.1)
Great reprex, that was very helpful. The problem is that when you use predict(model_ss)
you get the forecasts of the underlying monthly variable. The way the model works is that we assume that there is a latent monthly variable that is governing the quarterly observations we get. The forecasts you obtain from the model are for this underlying, monthly variable. For a fixed quarterly value, the underlying monthly values vary from iteration to iteration.
If you do median(model_ss$Z_fcst[10, 3, ])
you get 393 (just as from predict
), but if you do colMeans(model_ss$Z_fcst[8:10, 3, ])
you see that the average of the monthly values within the final quarter is fixed to 354. So the forecast you get in predict
is correct, it's just that the plotting method should anchor the forecasts at the final quarterly observation. The plot that you get therefore is a little bit special, because it shows quarterly observations, but monthly forecasts. I am working on a new version of the package where this is handled a little bit better (so that also the forecasts are presented on the quarterly frequency).
For now, I have updated the package, so if you do
devtools::install_github("ankargren/mfbvar")
you should see that the plot
method returns a plot with no discontinuity. Let me know if you try it and have no further issues with it, and I'll update the version on CRAN too.
I updated on my end and tried again with both the simulated example and my actual unshared code, and it plotted correctly for both -- thanks!
Many thanks as well for the detailed answer. I was not aware of the assumption that there is a latent monthly process underlying the quarterly observations. To clarify, should I be taking 3-month averages of monthly forecasts to be the forecasted quarterly data at the end of the 3-month period? Or should I be taking an individual monthly forecast to be the forecasted quarterly data for the quarter ending on that month?
Good to hear!
Yes, you should take averages over the months within each quarter. The reason why the current version of the package does not do that is that I didn't want to assume too much about the structure of the input (e.g. dates etc), but I've come to a change in opinion with regards to that. The new version will be a little stricter, but on the flipside that means it will compute the quarterly forecasts.
To get things right, if you want the quarterly forecast in the simulated example you provided you would need to do: quantile(colMeans(model_ss$Z_fcst[11:13, 3, ]), probs = 0.5)
to get the median forecast of the quarterly variable. (That would also require you to use n_fcst = 3
or higher.) Your input here is very valuable here actually, it makes it clear for me what I need to clarify in the package. I'll make sure this is better documented in the new version. (If you have further questions, feel free to send me an e-mail.)
Hi, thanks for a great package!
I'm getting some puzzling behavior when looking at the output from plot.mfbvar_ss. The forecast doesn't pick up where the in-sample data left off (what I am calling a disconnected forecast), but for just one of the variables. For that variable, the output from the predict.mfbvar method confirms that the point at which the forecast begins is not equal to the observed data at that point.
I've attached a portion of the plot output below -- the disconnected forecast is on the far right. The other two charts seem fine; for those variables, the point where the forecast begins is the observed in-sample data.
The variable with the disconnected forecast uses quarterly data, while the other variables use monthly data. I initially thought it might be a ragged-edges issue. However, the in-sample data ends with no missing data, where the quarter and the month align to both have observed data, so I'm still surprised to see the predict.mfbvar method returning a starting forecast point that is not equal to the observed quarterly data point.
I have tried this with different ending rows for the in-sample data, and I get the same result -- only the quarterly variable's forecast seems to disconnect from the observed value. I haven't been able to find anything wrong with the parameter inputs for estimate_mfbvar, any of my inputs to specify the steady-state prior, or any of my indexing.
Thank you for the help!