danielkorzekwa / bayes-scala

Bayesian Networks in Scala
Other
205 stars 39 forks source link

Type of features #2

Closed danyaljj closed 10 years ago

danyaljj commented 10 years ago

Hi, I see that you can define the graphical model based on tables of probabilities of variables (CPD) for Bayes-Scala. Is it possible to define factors with discriminative features, with arbitrary feature functions? For example a factor with the following potential functions: 111 and the overal distribution will be of the following form: 222

In fact, I am not sure, implementation-wise, how different this is from your examples (like when you have the factors as proper probability distributions)

Thanks

danielkorzekwa commented 10 years ago

Hi Daniel,

When using bayes-scala for inference on cluster graphs, you specify discrete CPDs as an array object, so in fact you can use your potential function and iterate over all CPD indexes in order to specify appropriate values.

Inference on cluster graphs is pretty much the same for bayesian nets (normalised probabilities) and markov networks (not normalised probabilities), so I believe it shall work in both cases.

Let me know if you need more help.

Regards. Daniel

On 9 February 2014 19:42, Daniel Khashabi notifications@github.com wrote:

Hi, I see that you can define the graphical model based on tables of probabilities of variables (CPD) for Bayes-Scala. Is it possible to define factors with discriminative features, with arbitrary feature functions? For example a factor with the following potential functions: [image: 111]https://f.cloud.github.com/assets/2441454/2120733/c76de34a-91c1-11e3-969b-c846792a48fa.png and the overal distribution will be of the following form: [image: 222]https://f.cloud.github.com/assets/2441454/2120738/115fd7e2-91c2-11e3-87a0-47a67fc9d834.png

In fact, I am not sure, implementation-wise, how different this is from your examples (like when you have the factors as proper probability distributions)

Thanks

Reply to this email directly or view it on GitHubhttps://github.com/danielkorzekwa/bayes-scala/issues/2 .

Daniel Korzekwa Software Research Engineer priv: http://danmachine.com blog: http://blog.danmachine.com

danyaljj commented 10 years ago

Hi Daniel, I really appreciate for your explanations. Let me be a little more specific. Consider the model attached.

There is a function \Phi( ) which gets some x and and given some 'w' gives a probability for 'y'. Now Assume that we a training set, pairs of (x1, y1), (x2, y2), (x3, y3), .... and using this training set, we want to estimate w (which is the parameter of the prediction inside Phi(.) function.) The Phi(.) function could be any nonlinear function, e.g. exponential, probit, sigmoid (whichever easier).

In order to estimate w, we need to be able to 'project' on Phi( w^T x) (right?) Do you have such example of projection for a nonlinear function among examples? (I couldn't find anything).

Thanks, Daniel

On Mon, Feb 10, 2014 at 3:23 AM, Daniel Korzekwa notifications@github.comwrote:

Hi Daniel,

When using bayes-scala for inference on cluster graphs, then you specify discrete cpd as an array object, so in fact you can use your potential function and iterate over all cpd indexes in order to specify appropriate values.

Inference on cluster graphs is pretty much the same for bayesian nets (normalised probabilities) and markov networks (not normalised probabilities), so I believe it shall work in both cases.

Let me know if you need more help.

Regards. Daniel

On 9 February 2014 19:42, Daniel Khashabi notifications@github.com wrote:

Hi, I see that you can define the graphical model based on tables of probabilities of variables (CPD) for Bayes-Scala. Is it possible to define factors with discriminative features, with arbitrary feature functions? For example a factor with the following potential functions: [image: 111]< https://f.cloud.github.com/assets/2441454/2120733/c76de34a-91c1-11e3-969b-c846792a48fa.png

and the overal distribution will be of the following form: [image: 222]< https://f.cloud.github.com/assets/2441454/2120738/115fd7e2-91c2-11e3-87a0-47a67fc9d834.png

In fact, I am not sure, implementation-wise, how different this is from your examples (like when you have the factors as proper probability distributions)

Thanks

Reply to this email directly or view it on GitHub< https://github.com/danielkorzekwa/bayes-scala/issues/2> .

Daniel Korzekwa Software Research Engineer priv: http://danmachine.com blog: http://blog.danmachine.com

Reply to this email directly or view it on GitHubhttps://github.com/danielkorzekwa/bayes-scala/issues/2#issuecomment-34612247 .

danielkorzekwa commented 10 years ago

I think that for the model you describe it's better to work with continuous space rather than using table CPDs, for which you would need to discretize your data of {x,w,y}. In the baye-scala, I provide inference on cluster graphs in discrete space only, but also I support loopy belief propagation and expectation propagation on factor graphs, which is more suitable for your problem.

In your model you propose a non-linear likelihood function of p(y|x,w), which often requires using approximation techniques such as expectation propagation (EP). Example of EP model: https://github.com/danielkorzekwa/bayes-scala/blob/master/doc/trueskill_in_tennis_factor_graph/trueskill_in_tennis_factor_graph.md

Different option is to use linear gaussian likelihood, then the model you describe is simply bayesian linear model, which is easy to work with. For this model you only need two operations on gaussian distributions, product and sum (marginalisation). Note that you can still incorporate non linearity into linear model by working in a feature space (polynomials,sin/cos, exp, etc.), rather than using original input space.

For the bayesian linear model you can use gaussian functions from bayes-scala https://github.com/danielkorzekwa/bayes-scala/blob/master/src/main/scala/dk/bayes/math/gaussian/CanonicalGaussian.scala

https://github.com/danielkorzekwa/bayes-scala/blob/master/src/main/scala/dk/bayes/math/gaussian/CanonicalGaussianOps.scala

For the non-linear model with e.g. sigmoid likelihood, you would need compute some approximation messages in order to use EP inference from bayes-scala.

Hope this helps, let me know which option you prefer.