BasicProbability / LectureNotes

Lecture Notes (with exercises) for Basic Probability course at University of Amsterdam
17 stars 12 forks source link

Chapter 6 #26

Open philschulz opened 8 years ago

philschulz commented 8 years ago

Hey Christian,

I have completely reworked chapter 6 which is now only about EM. In doing so, I have fallen in love with the R integration for latex. I have basically implemented the EM algorithm within the doc. I still need to fix a couple of dependencies but the idea is that soon you'll be able to change the toy data set and all results in the doc will automatically be updated to the correct values. As before, please put any commments in the Rnw. The doc is complete except for the very last paragraph (the M-step in the example) where I still need to adjust some things.

cschaffner commented 8 years ago

Put the comments in the Rnw???

Sent from my iPhone

On 01.09.2016, at 17:15, philschulz notifications@github.com wrote:

Hey Christian,

I have completely reworked chapter 6 which is now only about EM. In doing so, I have fallen in love with the R integration for latex. I have basically implemented the EM algorithm within the doc. I still need to fix a couple of dependencies but the idea is that soon you'll be able to change the toy data set and all results in the doc will automatically be updated to the correct values. As before, please put any commments in the Rnw. The doc is complete except for the very last paragraph (the M-step in the example) where I still need to adjust some things.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.

philschulz commented 8 years ago

The updated files are all in the philip_edits branch (I had mentioned this in the previous issues but forgot it in this one). In chapter6 folder there are two files with the extension ".Rnw". For the purposes of inserting text/comments, they work just like regular .tex files. I am afraid that if you looked at the master branch you just saw last year's version. Let's talk about this tomorrow.

cschaffner commented 8 years ago

I've left various comments in the file, I've not quite completed my checks yet. It's quite complicated...

philschulz commented 8 years ago

Hey Christian,

I have addressed the comments in chapter 6. Still need to correct the M-step. Please check whether you think the changes I've made are sufficient.

philschulz commented 8 years ago

Hey Christian,

I have updated the M-step of chapter 6. I reworked pretty much everything from page 9 onward. I realized that there are some numerical mistakes in the texts which result because I have not yet linked the against knitr, only the tables. Thus, the tables are correct but some numbers in the text are still hard-coded and thus wrong. If you are willing to skip over this for the time being, you can consider this a final draft.

philschulz commented 8 years ago

I will link everything to knitr before next Wednesday.

cschaffner commented 8 years ago

I've put some more comments in Chapter 6 now. The example steps are still not entirely clear to me, I'll have a look at the Collins paper now, and get back to this script then... Maybe it would be nice to see what comes out in the example after a few more iterations? and how to interpret that?

philschulz commented 8 years ago

Hey Christian,

seems like we are slowly converging on chapter 6. I incorporated your comments whenever I agreed. I addressed the remaining two inline.

I also linked all the numbers in the text, equations and tables to the R computations. There were still a lot of numerical mistakes in the previous version. The cool thing now is that if you change the data, everything(!) gets automatically updated.

Lastly, I have added a graphical model and some explanation to the mixture model section (6.1).

Let me know what you think.

philschulz commented 8 years ago

Hey Christian,

I now also added an information-theoretic proof at the end of ch7 which shows that EM always increases the likelihood. This may or may not help you to get a better feel for what's going on.

cschaffner commented 8 years ago

In the example of EM, \theta^0 should often be \theta^{(0)} , right?

cschaffner commented 8 years ago

In (6.9) we have to explain how to compute $$P(X_1=6 | Theta = \theta^{(0)})$$ ! namely http://www.wolframalpha.com/input/?i=(10+choose+6)+(1%2F4(+0.4%5E6+*+0.6%5E4+%2B+(0.65%5E6+*+0.35%5E4)+)+%2B+1%2F2(0.5%5E10))

cschaffner commented 8 years ago

In general, the MLE for θ j of a categorical is #c_j / n . In our case n = 10. ?? Really? We have 20 observations!

cschaffner commented 8 years ago

Formally this means... there is a subscript i missing in X=x_i

cschaffner commented 8 years ago

Why did you have to include the actual .tex code into the full script and not just a link to the _forinclude part? This will be really inconvenient to edit anything...

philschulz commented 8 years ago

Hey Christian,

in order to integrate knitr properly into the script, I had to turn the fullscript.tex into fullscript.Rnw. The way knitr works internally is that it first converts Rnw into tex and then compiles the tex. Thus, the tex that you refer to has been automatically generated. All future editing should only take place in the Rnw file. In there, you'll only find include statements (or their knitr equivalents).

cschaffner commented 8 years ago

OK. I've just checked Chapter 6 and it looks pretty good to me now. I've only changed a few things.

cschaffner commented 8 years ago

There's one thing I don't understand: screen shot 2016-10-05 at 20 34 30 how does the conditioning work there?

cschaffner commented 8 years ago

and, I guess there are some indicator functions missing in (6.15) as well, right?

philschulz commented 8 years ago

1) The probability that they occur jointly IN THE DATA SET is the probability that x together with y. x is occurs IN THE DATA SET with probability 1 because it is observed in the data. Thus their joint probability depends only on y|x which is given by the posterior 2) 6.15 is wrong altogether (which is annoying because it's correct in the text around that formula). I'll fix this.

philschulz commented 8 years ago

Hey Christian,

after reading through the text again, I threw out 6.15 entirely. It was totally disconnected from the text and the actual update is given in 6.16 (which is now the new 6.15).

philschulz commented 8 years ago

Could you announce tomorrow that we have made some minor changes?

philschulz commented 8 years ago

Uff,

I found yet another typo. Next time let's just tell them to implement it without giving them any example ;)

cschaffner commented 8 years ago

it's normal to make lots of typos. I think the example is really the only way to actually understand this...

cschaffner commented 8 years ago

I don't understand your comment:

1) The probability that they occur jointly IN THE DATA SET is the probability that x together with y. x is occurs IN THE DATA SET with probability 1 because it is observed in the data. Thus their joint probability depends only on y|x which is given by the posterior

It seems to me that it always holds that E[ \ind(Xi=xi,Yi=cj) | Theta = ... ] = Pr[ Xi=xi, Yi=cj | Theta = ...] so why is this the same as Pr[Yi=cj | Xi=xi, Theta = ...] ??

philschulz commented 8 years ago

Hmmm,

so the idea is you want to compute the expected number of occurrences of the pair (x,y). You already know the outcome x, thus the only random quantity is y. I guess another way of writing this would be to drop the upper case letters in expectation altogether and instead write

E[1(x,c)|X=x,\Theta=\theta)

Do you like that one better? You could also leave at as is and put the information that X=x in the conditioning context, but then things start to look weird because you'd have x on both sides of the bar. What I mean is this: E(1(X=x,Y=c)|X=x,\Theta=\theta)

philschulz commented 8 years ago

Formally, however, I thing that the last one is the most "correct" because then the equality that you were looking at follows immediately.

cschaffner commented 8 years ago

Why do you even include X=x in the indicator variable then? Why isn't it just E(1(Y=cj) | X=x,\Theta=\theta) ? I think that means: the expected occurences of that coin cj in case we observed x.

philschulz commented 8 years ago

This is what it comes down to but from this it does not really become clear that in this case we are counting the expected number of times that they occur together. Doing it like this doesn't really bring out the difference between the sufficient statistics for the binomial per component and the categorical over components. What I am looking for is a good notation to say: when we are computing the sufficient stats for the categorical, we are only interested in how often c_j occurs. When we compute the ss for the binomial distributions we are interested in how often each pair (x,c_j) occurs.

cschaffner commented 8 years ago

and they are related by the number of occurrences, I guess?

philschulz commented 8 years ago

Yes, the expected sufficient statistic is for the categorical is essentically \sum_x 1(x,c_j) for each c_j

philschulz commented 8 years ago

Sorry, should be \sum_x E[1(x,c_j)]

cschaffner commented 8 years ago

I guess you mean \sum_x E[ 1(X=x,Y=cj) ] by this short form above, right?

So, you are claiming that this is equal to no. of occurrences of x in the data * posterior = \sum_{j=1}^m 1(x=x_j) * P[Y=cj | X=x, Theta = ...]

Notation is really not good right now... I'm confused, I think I have to go to bed. But I think I see what you mean...

philschulz commented 8 years ago

Yes, that's exactly what I mean. I added a branch called suggestion where I added the alternative notation that I first suggested.

I am supertired as well. In this state there is not much we can do productively, I guess. Have a good night.