Questions on Learning (section 8 tutorial)

toncho11 commented 9 years ago

I have to admit I like Figaro. https://www.cra.com/pdf/Figaro-tutorial.pdf

In section 8 we are trying to estimate a Beta that we will use in Flip() to generate a Bernoulli distribution.

The Beta is the "parameter", also the "prior probability distribution" that we are trying to learn, right? I find the word "parameter" confusing because the Beta itself has parameters (alpha and beta).

Why we first define Beta with val fairness = Beta(1,1). Why search for something that we define ourselves? This is because the EM is an iterative process and needs a starting point?

The Flip(fairness.MAPValue) defines the final Bernoulli distribution based on the provided data, right?

Also the code samples are different from the tutorial.

toncho11 commented 9 years ago

Another comment/question:

The tutorial is very quick on the Alarm scenario. It focuses too much on Figaro and does not provide the overall picture. It does not pose a question - for example "What is the chance that John call provided the alarm went off?"

I think the following should also be elaborated:

private val alarm = CPD(burglary, earthquake, (false, false) -> Flip(0.001), (false, true) -> Flip(0.1), (true, false) -> Flip(0.9), (true, true) -> Flip(0.99))

So the real notation of the above is: P(alarm | burglary,earthquake). What is the probability that the alarm go off provided the states of burglary and alarm. By just looking at the code this is not exactly clear. We create "alarm" variable and one would expect that "alarm" variable already existed.

So Bayesian statistics tells us that if we condition alarm on all the values of burglary and earthquake then the resulting conditional distributions are enough to reconstruct the original alarm distribution (before conditioning). This explains why create Alarm and why all cases are required.

Lowest probability of Alarm is when there is no burglary and no earthquake. And biggest Alarm chance is when they both occur (last line). This explains the selected probabilities in the code above.

Then this means that in the direction of inference top to bottom the probabilities of burglary and earthquake are not really taken into account. This is because they are overridden by the above CPD. Is that correct?

bruttenberg commented 9 years ago

Hi, thanks for your interest in Figaro!

First, thank you for pointing out some differences between the code samples and the tutorial. If you have a change, we’d certainly appreciate hearing which ones are in error.

Second, you are asking a lot of good questions. The tutorial is meant to get someone started using Figaro who has some background in Bayesian modeling. I think you could benefit from Figaro more by checking out Avi Pfeffer’s book on probabilistic programming with Figaro (he created Figaro) here:

http://www.manning.com/pfeffer/

It has a lot of introductory material on probabilistic reasoning and modeling that may help you understand Figaro even more.

With regard to your comments below, I guess I’m having a hard time understanding your specific question. The alarm model has conditional probability distribution for P(alarm | burglary, earthquake), but it also has probabilistic distributions for burglary and earthquake. If one observes a value of burglary and earthquake, then the probability of alarm is based on P(alarm | burglary=x, earthquake=y), and the probabilities of burglary and earthquake do not matter. More typically, one wants to compute the P(burglary | alarm), in which case the distributions over burglary and earthquake are relevant.

Brian

- - - - + Brian Ruttenberg Charles River Analytics Inc. 617.491.3474 x730 www.cra.comhttp://www.cra.com/

From: toncho11 [mailto:notifications@github.com] Sent: Tuesday, April 28, 2015 6:29 AM To: p2t2/figaro Subject: Re: [figaro] Questions on Learning (section 8 tutorial) (#445)

Another friendly comment and clarification :)

The tutorial is very quick on the Alarm scenario. It focuses too much on Figaro and does not provide the overall picture. It does not pose a question - for example "What is the chance that John call provided the alarm went off?"

I think the following should also be elaborated:

private val alarm = CPD(burglary, earthquake, (false, false) -> Flip(0.001), (false, true) -> Flip(0.1), (true, false) -> Flip(0.9), (true, true) -> Flip(0.99))

So the real notation of the above is: P(alarm | burglary,earthquake). What is the probability that the alarm go off provided the states of burglary and alarm.

So Bayesian statistics tells us that if we condition alarm on all the values of burglary and earthquake then the resulting conditional distributions are enough to reconstruct the original alarm distribution (before conditioning). This explains why all cases are required.

Lowest probability of Alarm is when there is no burglary and no earthquake. And biggest Alarm chance is when they both occur (last line). This explains the selected probabilities in the code above.

Then this means that in the direction of inference top to bottom the probabilities of burglary and earthquake are not taken into account. This is because they are overridden by the above CPD. Is that correct?

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/445#issuecomment-97007364.

charles-river-analytics / figaro

Questions on Learning (section 8 tutorial) #445