Closed yfpeng1234 closed 2 weeks ago
Hi! I'm afraid I don't quite understand what you mean by "masked" nodes. Do you perhaps mean that you do not know the values of those variables and you want to compute the log-likelihood? If so, then you can use the logLik()
function from the bnlearn package with DBNs and datasets from dbnR. That function allows you to compute the log-likelihood of some of the nodes of the network and not the whole network. Here's a reproducible example that computes the log-likelihood of two nodes in a network:
library(dbnR)
library(bnlearn)
dt <- dbnR::motor
dt_train <- dt[1:2800]
dt_test <- dt[2801:3000]
size <- 2
f_dt_train <- fold_dt(dt_train, size)
f_dt_test <- fold_dt(dt_test, size)
net <- learn_dbn_struc(dt_train, size, method = "dmmhc", f_dt = f_dt_train)
fit <- fit_dbn_params(net, f_dt_train)
logLik(fit, f_dt_test, nodes = c("pm_t_0", "ambient_t_1"), debug = TRUE)
The total log-likelihood is 1275.789, and given that the log-likelihood is a decomposable score, you can see each node's score independently: 871.62 for pm_t_0 and 404.17 for ambient_t_1. If you do not know the value of some variable in t_0, then predict it first and then compute the likelihood
thanks so much for your reply. This is exactly what I mean. Actually,I hope the log likelihood computation could marginalize the variables that I don't know the values
This falls more on the side of the log-likelihood computation in bnlearn, but I'd say that you can just perform inference on the values that you do not know and then calculate the log-likelihood. Afterall, those calculated values are the most likely values for each variable given the evidence, they should be equivalent to marginalizing the variables. You could use either the mvn_inference()
or the predict_dt()
functions to get those missing values
sure, this is really a insightful suggestion, I will try it, thanks so much for your help again
Hi dear author, I encountered a problem with DBN evaluation. If I learn a DBN from training-set, where all the values are observable. While for testing, some of the variables are masked out. For example, we have "a_t_1,b_t_1,c_t_1,a_t_0,b_t_0,c_t_0", while values of "a_t_1,c_t_0" are masked. If I want to evaluate my DBN with this test set, for example, compute the log likelihood for the test set, should I use the learned DBN to infer the values of masked nodes, then compute the whole likelihood? Or could you please suggest another method?