charles-river-analytics / figaro

Figaro Programming Language and Core Libraries
Other
756 stars 151 forks source link

Questions/Feedback on Practical PP #592

Open toncho11 opened 8 years ago

toncho11 commented 8 years ago

Feedback 1: I would like first to note that the chapters 1-3 seems very well written to me.

Feedback 2: Training on the spam filter takes a lot of time for this amount of documents. Also one needs to use 64 bit Java VM with highest possible Heap size I suppose. I get Heap size errors.

Question 1: when algorithm.start() is executed I do not see how we explicitly give the models. Somehow the algorithm knows about all the models we have previously created and applies inference to all of them. Not clear to me.

Question 2: So in learning we have the:

val isSpam = Flip(parameters.spamProbabability) val spamProbability = Beta(2,3)

So this is actually normal, but not explicitly explained. We are modelling a parameter of a distribution with another distribution. We are modelling a parameter of Bernoulli distribution with a Beta distribution. Right? And now how 100 training documents produce 31005 elements (approximately)?

apfeffer commented 8 years ago

Hi,

Thanks for the feedback – I appreciate it.

Yes, I realized training on the spam filter can take a lot of time and memory. This is often true of probabilistic programming applications. However, the online EM method is more memory friendly and efficient on this problem, but it’s only described in chapter 13.

To answer your questions:

Question 1 – This is related to the concept of universe. All elements in Figaro are placed in a universe. Most of the time you can ignore this, as the elements are placed in a default universe. Algorithms also run on a universe – again, if you don’t specify a universe explicitly, the default universe is used. So when you run an algorithm, the model consists of all the elements in that universe.

Question 3 – You are correct. This is explained in detail in the following chapters. The reason you have so many elements is that every training document has an element for each word in the dictionary indicating whether the word is present. However, the learned parameters of the model only occur once.

I hope this helps.

Avi

From: toncho11 [mailto:notifications@github.com] Sent: Friday, July 1, 2016 12:17 PM To: p2t2/figaro figaro@noreply.github.com Subject: [p2t2/figaro] Questions/Feedback on Practical PP (#592)

Feedback 1: I would like first to note that the chapters 1-3 seems very well written to me.

Feedback 2: Training on the spam filter takes a lot of time for this amount of documents. Also one needs to use 64 bit Java VM with highest possible Heap size I suppose. I get Heap size errors.

Question 1: when algorithm.start() is executed I do not see how we explicitly give the models. Somehow the algorithm knows about all the models we have previously created and applies inference to all of them. Not clear to me.

Question 2: So in learning we have the:

val isSpam = Flip(parameters.spamProbabability) val spamProbability = Beta(2,3)

So this is actually normal, but not explicitly explained. We are modelling a parameter of a distribution with another distribution. We are modelling a parameter of Bernoulli distribution with a Beta distribution. Right? And now how 100 training documents produce 31005 elements (approximately)?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/p2t2/figaro/issues/592, or mute the threadhttps://github.com/notifications/unsubscribe/AFJkd2DcogWZsBD2TU99BfqFaystQYA9ks5qRT1igaJpZM4JDSOS.

toncho11 commented 8 years ago

Hi,

I think that there is an error on figure 4.16 page 125. It is marked 0.1 for the density, but it should be 0.01. It is 1/(150-50). The text also states 0.01.

bruttenberg commented 8 years ago

Hi,

Can you please post this on the Manning forum for the Practical PP book? The forum is here:

https://forums.manning.com/forums/practical-probabilistic-programming

Brian

From: toncho11 [mailto:notifications@github.com] Sent: Thursday, July 28, 2016 2:55 AM To: p2t2/figaro figaro@noreply.github.com Subject: Re: [p2t2/figaro] Questions/Feedback on Practical PP (#592)

Hi,

I think that there is an error on figure 4.16 page 125. It is marked 0.1 for the density, but it should be 0.01. It is 1/(150-50). The text also states 0.01.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/p2t2/figaro/issues/592#issuecomment-235814446, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFJOJfVgOUFn12raIidpqjD_wjpPC_tqks5qaFJagaJpZM4JDSOS.

apfeffer commented 8 years ago

Thanks for this correction, I appreciate it.

From: toncho11 notifications@github.com Reply-To: p2t2/figaro reply@reply.github.com Date: Thursday, July 28, 2016 at 2:55 AM To: p2t2/figaro figaro@noreply.github.com Cc: Avi Pfeffer apfeffer@cra.com, Comment comment@noreply.github.com Subject: Re: [p2t2/figaro] Questions/Feedback on Practical PP (#592)

Hi,

I think that there is an error on figure 4.16 page 125. It is marked 0.1 for the density, but it should be 0.01. It is 1/(150-50). The text also states 0.01.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/p2t2/figaro/issues/592#issuecomment-235814446, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFJkd3-BIZzK7l9r3_WGfm4VIkDVFfW0ks5qaFJagaJpZM4JDSOS.