dkesada / dbnR

Gaussian dynamic Bayesian networks structure learning and inference based on the bnlearn package
GNU General Public License v3.0
44 stars 10 forks source link

Factorial Data Treatment #1

Closed n8shadow closed 4 years ago

n8shadow commented 4 years ago

Hi, Thanks for that package; I really appreciate that work! Is it planned to have an extension for dealing with non numeric data as well? I actually find myself very often in the situation I have to cope with mixed attribute types. Otherwise do you have a recomendation what a "good way" was to overcome this.

dkesada commented 4 years ago

Hello and thank you for the kind words!

For the moment I don't have plans to extend the package to deal with pure categorical datasets. It should be kind of straight forward to implement in a similar fashion to the Gaussian case, but exact inference with categorical data gets trickier. If you find yourself facing a purely categorical problem, there are many tools and packages in both Python and R that deal with discrete Bayesian networks. If you are in a hurry, I would advise in favour of a similar approach to this package: divide the data in time slices, train a static network, train a transition one, mix them and learn the parameters. All this can be done with the bnlearn package. If time is not such a pressing matter, I would advise you to look for more specialized packages for discrete dynamic Bayesian networks or even Hidden Markov Models.

In the case of mixed categorical and numerical data, it gets even trickier due to the limitations on which parents can discrete nodes have. For this case, I do have plans to make an extension that can deal with some discrete variables as well as with numerical ones. The gist of it would be to first divide your categorical variables with a tree or a similar classification method and then train a dynamic Gaussian Bayesian network with the data in each of the leaves or the classes. This is different from a mixed dynamic bayesian network, but it can be useful to avoid some of the linearity of the Gaussian model and to process this non numerical variables. Keep in mind though that this approach doesn't allow for many discrete variables, as that would divide the dataset into way too small chunks of data to train robust DBNs in each one of them. It should also be possible to follow the same approach of this package for mixed DBNs, but you face again the problem of implementing the exact (or approximate) inference and the new problems that arise when learning the structure and mixing the static network and the transition one.

I hope I could be of help. Cheers!

n8shadow commented 4 years ago

Thanks a lot for your detailed answer!

I think I will try as recommended. Since I have both, numeric and categorical data, I would first try by discretizing using bnlearn functionalities and learn a static and a transition network (I already did that for non-dynamic bn).

Regarding inference, I usually transfer to gRain in order to conduct exact inference. But I am not sure if all - inference, filtering and prediction - can be done there.

There is also an R-package called bnstruc, which I didn't dive deep into yet, though. I am also going to explore this in more depth.

An extension for mixed attributes as you commented would also be of great help.

Thanks again and kind regards!