Closed RKSKEKF closed 7 months ago
Hi, in the case of DBNs and time series data this is a little bit different from the way bnlearn works. With time series data, you usually do not have the objective variable in the "t_0" instant, that is, the future one you want to predict. It is assumed that, in this case, you have the values for "pm_t_1" and "pm_t_2" from previous instants, and you want to predict the next "pm_t_0". In static BNs, you can just remove the objective column because you will never have the values of that objective variable. If you want to have the pm variable missing also in "t_1" and "t_2", then you have to set obj <- c("pm_t_0", "pm_t_1", "pm_t_2")
, but I'll assume this is not your case for the following example:
library(data.table)
library(dbnR)
size = 3
data(motor)
str(motor)
dt_train <- motor[200:900]
dt_val <- motor[901:1000]
# With a DBN
obj <- c("pm_t_0")
net <- learn_dbn_struc(dt_train, size)
f_dt_train <- fold_dt(dt_train, size)
f_dt_val <- fold_dt(dt_val, size)
f_dt_val_R <- copy(f_dt_val)
f_dt_val_R[, pm_t_0 := NaN]
In dbnR, you need to provide datasets where all variables inside the DBNs are present as columns in the data, even if the values are missing. This is due to the fact that I need to perform the dataset partition inside the function calls, and I already remove the variables in "t_0" from the dataset when performing the predictions. In the above code, I fold the dataset and then replace the objective column "pm_t_0" with NaNs. Now I'll show that this action has no effect in the results, because the objective column is not used during the predictions:
fit <- fit_dbn_params(net, f_dt_train, method = "mle-g")
fit$pm_t_2
res <- suppressWarnings(predict_dt(fit, f_dt_val, obj_nodes = obj, verbose = F))
res_R <- suppressWarnings(predict_dt(fit, f_dt_val_R, obj_nodes = obj, verbose = F))
table(res$pm_t_0, res_R$pm_t_0) # same result
You obtain the same results whether you remove the objective variable or not from the dataset. In fact, you do not use any of the variables in "t_0" for the predictions, because that would be introducing information from the future into your predictions, and that would be look ahead bias. All in all, you do not have to worry about removing the objective variable from the dataset in dbnR, and if you want to predict some values in a real case situation, just input a dataset with the objective column empty in "t_0".
I tried an experiment by keeping the variables in the dataset but replacing values with zeros, and the results turned out weird. I didn't think of using NaN, haha. Thanks so much for your answer!
The above code should also work in the same way if you substitute the f_dt_val_R[, pm_t_0 := NaN]
with f_dt_val_R[, pm_t_0 := 0]
, but you need to import the data.table library for that code to work properly. Otherwise you can find some unexpected behaviour because R thinks that you are using data.frames and underneath they are data.tables used inside dbnR. Anyways, I'm glad that helped!
Hi i have some question about prediction
After make a model we want to predict the data
in general case we input data without target node like this
but in this case does not it worked when remove the target node
am I misunderstanding and using it incorrectly?