danielkorzekwa / bayes-scala

Bayesian Networks in Scala
Other
205 stars 39 forks source link

A (more) fluent API? #9

Closed BertrandDechoux closed 9 years ago

BertrandDechoux commented 10 years ago

I have started a proof of concept for a more fluent API. And I have already rewritten the test networks using this API : SprinklerBN, StudentBN and even TennisDBN.

It is a first draft and it does not support everything (eg only discrete factors) but I wanted to have a first feedback. I clearly need to understand more bayes-scala before stating it is even an improvement.

What's your opinion?

danielkorzekwa commented 10 years ago

bayes-scala doesn't provide any high-level api at the moment, so it would be improvement. It provides implementations of different bayesian techniques, e.g.:

Basically, I was implementing these algorithms over time when I needed them for my work, but I never really attempted creating more high level API, although I have some ideas I can share with you now.

I presume you attempt creating API for bayesian-networks, which conceptually looks similar to a graphical SamIam tool. Creating nodes and dependencies, hiding details of underlying implementation, removing varIds, explicit edges, and so on. If so, then it looks good, however consider the following:

1) Your API supports Bayesian Networks only, it can't really create markov networks, which are supported by cluster graphs. Transforming Bayesian Network to cluster graph is not an obvious task, there are multiple cluster graphs possible for the same Bayesian Network. Think how the API will be actually implemented, do you want to support just bayesian nets in discrete space or any probabilistic models.

2) If the purpose of this API is more for learning and playing with small and simple Bayesian Networks, then of course it doesn't really matter what form of cluster graph or factor graph you use for representing it, it still will be fine from the performance point of view, but creating such API for large scale and efficient inference in both discrete and continuous space is a challenge and it likely needs lots of time to design and implement it.

3) Different form of API I was thinking my self follows more a probabilistic programming paradigm, e.g.

val winter = Bern(0.2)
val sprinkler = Mixture(winter, Bern(0,2), Bern(0.75))

or differently:

val sprinkler = winter match {
case true => Bern(0,2)
case false => Bern(0.75)
}

Basically working purely with variables like in programming, instead of explicitly using factors or nodes

BertrandDechoux commented 10 years ago

Thanks for your feedback.

I have reworked a bit the POC as it can be seen for SprinklerBN.

val rain = P('rain | winter) follows (0.1, 0.9, 0.3, 0.7)

which is quite nice for a simple example. The name of the Var/val is duplicated but it is a useful information for later in order to generate an export and/or a visualisation of the network.

I am at ease with Bayesian networks but I clearly will need to go back to my books for the details of cluster graph and factor graph. In that regard, your documentation seems really good because I know the references you have been using.

My roadmap is

Your examples are interesting. I really could see a simple evolution in order to support various factor types.

val rain = P('rain | winter) is (0.1, 0.9, 0.3, 0.7)
# same as above but with normalization
val rain = P('rain | winter) is Raw(1, 9, 30, 70)
# same as above but more explicit
val rain = P('rain | winter) is Mixture(Bern(0.1), Bern(0.3))

I am not a fan of pattern matching/switch statement for the configuration but I know a few APIs are doing that (in Python). I will see how it goes step by step. Ideally, the compiler should check as much as possible that the configuration is correct. That might be a secondary challenge.