Interpretation of predictive images

1369959395 commented 2 years ago

I have two little questions: 1.There are two color lines in the prediction image, which one represents the true color and which one represents the prediction. 2.When you make a prediction, you get results, but you get false alarms: In value[3L] : The sigma matrix is computationally singular. Using the pseudo-inverse instead. Does that matter? Thank you!

dkesada commented 2 years ago

Hi!

In the plots, the black line represents the real values and the red line represents the predictions. Now that you mention it, I think I don't specify it in the documentation of the inference and forecasting functions. I should add that clarification.

The warning about the pseudo-inverse is thrown because in the inference process of the multivariate Gaussian distributions underneath there is a matrix inversion operation. This operation cannot be performed on all matrices, and so when it is not possible to perform it we use the pseudo-inverse instead. The results obtained are correct, and in fact it is very common to not be able to obtain the exact inverse matrix because the lagged variables in the dataset are usually very closely correlated. You do not have to worry about this warning, and can ignore it with the suppressWarnings function if it's a bother and it clutters your console output. The only real situations when it can become a problem is when your matrix is perfectly singular and not even the pseudo-inverse can be calculated. This happens usually for one of the following reasons:

There are duplicated columns in your dataset
There are two columns in your data whose linear correlation is very close to 1 or -1
There are constant columns in the dataset
There is a column in your data that is a simple integer series like [0, 1, 2, 3, 4, ..., n] In this cases, removing the duplicated or constant columns or adding some white noise can fix the issues. For the most part though, you do not need to worry about the pseudo-inverse matrix warning.

I hope I could be of help. Cheers!

1369959395 commented 2 years ago

Thank you for your answer. On this basis, I still have some questions. 1.When we predict, do we need to provide evidence variables to predict the future value of the target variable, or do we say that both evidence variables and target variables can be predicted. 2.For target variable u_q is it necessary to assume that "u_q_t_0", "u_q_t_1" and "u_q_t_2" are all variables to be predicted when making prediction? Which of the following codes is the correct prediction: ①res_fore <- suppressWarnings( dbnR :: forecast_ts( f_dt_val , fit , obj_vars = c("u_q_t_0"), ini = 100 , len = 70 )) ②res_fore <- suppressWarnings( dbnR :: forecast_ts( f_dt_val , fit , obj_vars = c("u_q_t_0","u_q_t_1","u_q_t_2"), ini = 100 , len = 70 )) 3.Can't we have integer form in our data? 4.Is there any difference between the reasoning and prediction given in your example? Does the reasoning need to give a posteriori probability and then analyze it? However, it seems that the reasoning in your example is also making prediction, so I have some doubts. Thank you again!

dkesada commented 2 years ago

Hi, I'll try to answer your questions in order:

Normally, when we predict with a DBN the most common scenario is that we know beforehand the values of the variables in the past (t_1, t_2, ..., t_n) and we use them as evidence to predict the present (t_0). You can choose to provide some of the variables in t_0 as evidence too with the ev_vars argument of the forecast_ts function, but this should be used when you want to evaluate the system in some specific scenario or when you want to use the network as a simulator. You can also choose to not provide evidence for all the variables in the past, that case is explained in the partial evidence forecasting example in the markdown folder. Theoretically, you could also choose to predict all evidence and target variables, but if you provide no evidence whatsoever to the network it will use the prior distributions of all the variables in the system, which is not very useful.
You only need to use the variable in t_0 in the obj_vars argument. The t_1 and t_2 variables will be used to move the predictions with the moving window and forecast based on the predicted values. The correct code is: res_fore <- suppressWarnings(dbnR::forecast_ts(f_dt_val, fit, obj_vars = c("u_q_t_0"), ini = 100, len = 70))
You can have integer values in your data, but I think you need to convert them to numeric with as.numeric() because they could throw errors at some point. Keep in mind though that you are modelling a potentially discrete distribution that generated your integer values with a Gaussian distribution, and so you cannot expect it to return integer values when predicting with the model. It could give you decent results, because Gaussian distributions work well approximating other distributions, but it is worth noting that there are some strong assumptions there.
I'm sorry, but I don't know which example specifically are you reffering to. There are fundamentally 4 functions for inference purposes: mvn_inference, predict_dt, forecast_ts and smooth_ts. All of them have different purposes, but they use either the exact multivariate Gaussian inference or the approximate particle filter one to predict the value of the objective variables. This predicted value is the most likely value given the evidence, that is, the a posteriori mean of the Gaussian distributions of the objective variables. The only function that also returns the covariance matrix is mvn_inference, because it is the most general one and because the covariance matrix is independent of the mean. This means that no matter how far into the future your forecasting goes, the covariance matrix is always constant. If you wanted to perform further reasoning, you could use the forecast_ts function to obtain forecastings in combination with the mvn_inference to get the covariance matrix, so that you can have the mean value of your objective variable in each instant and the expected variance, to not base your predictions only on the expected mean.

I hope I could clear some of your doubts. Cheers!

1369959395 commented 2 years ago

Thank you for your answer. I think it is very useful. I will continue to think.

1369959395 commented 2 years ago

Hello, I would like to ask what is the difference between predict_dt and forecast_ts, and why the prediction accuracy of the two is so poor? How to set parameters if you want to predict unknown future values? For example, to predict a group of economic variables, I now have seven variables a, b, c, d, e, f, g and Y, with a total of 252 lines of data. After dividing the training set and the test set, I carry out structure learning and parameter learning to obtain the network structure. Here, my time slice size is set to 3. If my test set has 87 lines of data, is the prediction accuracy tested by the test set carried out by forecast_ts function? If the last line of all my data is the data value of January, how can I predict the data value of Y in February?

1369959395 commented 2 years ago

When trying out the example, I encountered a problem. The system prompted me that I did not have the function "as_named_vector":

ev_i <- f_dt_val[1, .SD, .SDcols = ev_vars] ev_i[1, pm_t_1 := 0.8] sprintf("We intervene the 'pm_t_1' variable, so that its new value is %.1f", ev_i[1, pm_t_1]) pred_i <- mvn_inference(attr(fit, "mu"), attr(fit, "sigma"), as_named_vector(ev_i)) pred_i$mu_p cat("\n") sprintf("The previous value for 'pm_t_0' is %f, and after the intervention it changes to %f", pred$mu_p[30], pred_i$mu_p[30])

Why?

1369959395 commented 2 years ago

What principle does fold_dt adopt? Is the corresponding relationship before and after folding and the predicted corresponding position given in the figure below correct?(size = 3) Finally, how to predict the value of ?'s location. I'm very sorry. I may have many questions. Thank you for your answer.

dkesada commented 2 years ago

Hi! It's ok, I'll try to answer your questions in order as best as I can.

Hello, I would like to ask what is the difference between predict_dt and forecast_ts, and why the prediction accuracy of the two is so poor?

The predict_dt function takes a dataset and calculates the objective variables for each row. That is, if you have a folded dataset of 50 instances with the variables [a_t_1, b_t_1, a_t_0, b_t_0] and use the predict_dt function setting 'a_t_0' as the objective variable, the DBN model will be used to predict the value of a_t_0 in each row given the values of the other variables. Only a single prediction is performed for each row of your dataset, and the error returned is the comparison of the real value of a_t_0 in each row with each prediction. The forecast_ts function is used to forecast up to a horizon based on some initial data. After you give it an initial point, the DBN will be used to forecast using a moving window for as many instants of time as you want. Afterwards, the forecasted time series will be compared with the real values in your data to evaluate the error. As for why the prediction accuracy is low, I'm afraid I do not know. It could be due to your dataset, to the preprocessing you are using, to the DBN you are learning or to many other reasons. I do not know what process are you trying to model, but I cannot assure you that DBNs will fit well any kind of data.

How to set parameters if you want to predict unknown future values? For example, to predict a group of economic variables, I now have seven variables a, b, c, d, e, f, g and Y, with a total of 252 lines of data. After dividing the training set and the test set, I carry out structure learning and parameter learning to obtain the network structure. Here, my time slice size is set to 3. If my test set has 87 lines of data, is the prediction accuracy tested by the test set carried out by forecast_ts function? If the last line of all my data is the data value of January, how can I predict the data value of Y in February?

Assuming I understand the problem correctly, you would want to use 'Y_t_0' as the objective of your forecast. If you used the fold_dt function correctly to shift your data to size 3, the last line of your data should contain all a, b, c, d, e, f, g and Y variables in all 't_2', 't_1' and 't_0' time-slices. In this case, you can use the forecast_ts function, setting the initial point of the prediction to the line that corresponds to the values in February and set the length to 1 to only predict that single instant. This will use the values of the variables in 't_1' (in January) and in 't_2' (in December, I assume) to predict the values of the variables in 't_0' (in February). You can also just use the mvn_inference function and provide it the values of the variables in January and December yourself so that it predicts the values for February.

When trying out the example, I encountered a problem. The system prompted me that I did not have the function "as_named_vector":

There was a mistake in that example. I recently changed the mvn_inference function so that as_named_vector does not need to be used from outside the package, and I forgot to change it in the markdown. it should just be: pred_i <- mvn_inference(attr(fit, "mu"), attr(fit, "sigma"), ev_i) I fixed it in the devel branch, it should be working as intened now. Thank you for letting me know!

What principle does fold_dt adopt?

I'll use a figure to illustrate its behaviour. It is simpler than it seems at first: imagen That figure represents exactly what the fold_dt function does. It simply shifts the columns in your dataset and generates new lagged columns that correspond to the variables in previous instants. The grey rows contain NAs and are removed inside the function.

Is the corresponding relationship before and after folding and the predicted corresponding position given in the figure below correct?(size = 3)

I think your drawing is indeed correct with the meanings of fold_dt, predict_dt and forecast_ts. Just a couple of comments:

You only show the 'a' variable, but keep in mind that every other variable is also present in the folded dataset.
If you set the size to 3, all t_0, t_1 and t_2 time-slices are present, so you would also have an additional a_t_1 column that you have not drawn in the figure.

Finally, how to predict the value of ?'s location.

For that, you have three options:

If you folded the dataset and that 7th row exists, you just have to use forecast_ts setting ini = 7 and len = 1. Or if you use the predict_dt function then the value predicted for the 7th row is the one you need.
If you do not have the 7th row, you can use either the mvn_inference function or the forecast_ts function with the values of the previous row. You need to set in t_1 and t_2 the values that in the 6th row were t_0 and t_1. To simplify this, I've added a new auxiliary function to the devel branch called shift_values that moves this values automatically for a single row. The code that does what you need would look something like:
```
size = 3
dt_train <- motor[1:2800]
dt_val <- motor[2801:2808]
net <- dbnR::learn_dbn_struc(dt_train, size)
f_dt_train <- fold_dt(dt_train, size)
f_dt_val <- fold_dt(dt_val, size)
fit <- dbnR::fit_dbn_params(net, f_dt_train, method = "mle")
dbnR::forecast_ts(dbnR::shift_values(f_dt_val, row = 6), fit, obj_vars = "pm_t_0", len = 1, print_res = F, plot_res = F)
```
In that small example I use the sample motor dataset to train a size 3 DBN, take a 6 instances test dataset and predict the value of the variable pm in what would be the 7th row, similarly to the figure you drew. Keep in mind that by shifting you add NAs to the values in t_0, so you cannot get MAE results.

To download the devel branch of dbnR, you can use devtools with the command: devtools::install_github("dkesada/dbnR", ref = "devel")

Whew, that was a long post. I hope I could help you with most of your doubts. Cheers!

1369959395 commented 2 years ago

Thank you for your patient answer. I think I understand. If there are any questions, I'll ask again. Thanks again!

1369959395 commented 2 years ago

I have gone to verify it. Your explanation is very helpful to me! I have one last question left: when we have data in rows 7 and 8, use forecast_ts to predict, ① Set ini = 7, len = 2 ② Set ini = 7, len = 1; Ini = 8, len = 1 and execute twice The results predicted by the two methods are different, which one is reasonable?

dkesada commented 2 years ago

① Set ini = 7, len = 2

With this option, you obtain the forecasting of the values of row 7 and 8, and the model only saw the values of the variables in what would be row 6. This is a true forecasting: the model only has information about past data and it predicts the future values based on that. If you want to perform forecasting over horizons longer than a single instant, this is the reasonable way to do them.

② Set ini = 7, len = 1; Ini = 8, len = 1 and execute twice

In this case, you are making two unrelated predictions: one for row 7 and one for row 8. In the previous case, the model uses the predicted values of the variables in row 7 to predict the values of what would be row 8. In this case, however, you ignore the results of the predictions for row 7 and take as evidence the real values in row 8, which the model should not have acces to. This cannot be done in a real world scenario, because you are using information in the future to predict the 8th row. Be careful with this, because this last option will most likely have better accuracy due to using as evidence values in the future that one should not know at the moment of forecasting.

1369959395 commented 2 years ago

OK, I see what you mean. It's very helpful to me.

dkesada / dbnR

Interpretation of predictive images #14