dkesada / dbnR

Gaussian dynamic Bayesian networks structure learning and inference based on the bnlearn package
GNU General Public License v3.0
44 stars 10 forks source link

log likelihood for partial observed dataset #30

Closed yfpeng1234 closed 2 weeks ago

yfpeng1234 commented 2 weeks ago

Hi dear author, I encountered a problem with DBN evaluation. If I learn a DBN from training-set, where all the values are observable. While for testing, some of the variables are masked out. For example, we have "a_t_1,b_t_1,c_t_1,a_t_0,b_t_0,c_t_0", while values of "a_t_1,c_t_0" are masked. If I want to evaluate my DBN with this test set, for example, compute the log likelihood for the test set, should I use the learned DBN to infer the values of masked nodes, then compute the whole likelihood? Or could you please suggest another method?

dkesada commented 2 weeks ago

Hi! I'm afraid I don't quite understand what you mean by "masked" nodes. Do you perhaps mean that you do not know the values of those variables and you want to compute the log-likelihood? If so, then you can use the logLik() function from the bnlearn package with DBNs and datasets from dbnR. That function allows you to compute the log-likelihood of some of the nodes of the network and not the whole network. Here's a reproducible example that computes the log-likelihood of two nodes in a network:

library(dbnR)
library(bnlearn)

dt <- dbnR::motor
dt_train <- dt[1:2800]
dt_test <- dt[2801:3000]
size <- 2
f_dt_train <- fold_dt(dt_train, size)
f_dt_test <- fold_dt(dt_test, size)

net <- learn_dbn_struc(dt_train, size, method = "dmmhc", f_dt = f_dt_train)
fit <- fit_dbn_params(net, f_dt_train)

logLik(fit, f_dt_test, nodes = c("pm_t_0", "ambient_t_1"), debug = TRUE)

imagen The total log-likelihood is 1275.789, and given that the log-likelihood is a decomposable score, you can see each node's score independently: 871.62 for pm_t_0 and 404.17 for ambient_t_1. If you do not know the value of some variable in t_0, then predict it first and then compute the likelihood

yfpeng1234 commented 2 weeks ago

thanks so much for your reply. This is exactly what I mean. Actually,I hope the log likelihood computation could marginalize the variables that I don't know the values

dkesada commented 2 weeks ago

This falls more on the side of the log-likelihood computation in bnlearn, but I'd say that you can just perform inference on the values that you do not know and then calculate the log-likelihood. Afterall, those calculated values are the most likely values for each variable given the evidence, they should be equivalent to marginalizing the variables. You could use either the mvn_inference() or the predict_dt() functions to get those missing values

yfpeng1234 commented 2 weeks ago

sure, this is really a insightful suggestion, I will try it, thanks so much for your help again