dkesada / dbnR

Gaussian dynamic Bayesian networks structure learning and inference based on the bnlearn package
GNU General Public License v3.0
44 stars 10 forks source link

marcovian order=1 but size=10 #31

Closed yfpeng1234 closed 1 month ago

yfpeng1234 commented 1 month ago

Hi, dear author, I have a problem when implementing dbn learning. The marcovian order=1 but I have dataset with 10 or more time slices. Then how should I deal with my dataset if I hope my DBN's order equal to 1. For example, if I have ['a_t_9','a_t_8',...'a_t_1','a_t_0'], then should I shifte the dataset to construct a ['a_t_1','a_t_0'] with more observations? Another interesting point I find is that, if I increase the size of DBN, for example, with more variables and higher order, the time cost of structure learning increase quadratically. If my DBN has (20 variables, 2 time slices), dmmhc only cost a few ms. But if I increase to (20 variables, 10 time slices), it takes a few minutes.

Best regards

dkesada commented 1 month ago

Hi! In dbnR, the structure learning and inference is performed following a moving window approach. If you set size=2, then your dataset will be formatted according to learn a Markovian order 1 network. I don't think I fully understand the problem you are having, but I think you already have your data formatted to order 10 and you want to learn an order 1 network. If that's the case, then you should reformat your dataset so that all values of each variable are grouped inside single columns ordered from older to newer instances. This means that intead of having [a_t_9, a_t_8, ..., a_t_0] you should have a single column a with several rows ordered from the oldest (t_9 in the first row in this example) to the newest (t_0). Then, when all your variables are grouped into single columns you can set any size parameter that you want.

Regarding the complexity of the algorithms, yeah, the DMMHC algorithm can become quite unfeasible with a high number of variables and high order. I performed a complexity analysis of the algorithms inside dbnR once, and only the PSOHO and natPSOHO algorithms scale propperly to high Markovian order with several variables per time slice.