Open philschulz opened 8 years ago
Put the comments in the Rnw???
Sent from my iPhone
On 01.09.2016, at 17:15, philschulz notifications@github.com wrote:
Hey Christian,
I have completely reworked chapter 6 which is now only about EM. In doing so, I have fallen in love with the R integration for latex. I have basically implemented the EM algorithm within the doc. I still need to fix a couple of dependencies but the idea is that soon you'll be able to change the toy data set and all results in the doc will automatically be updated to the correct values. As before, please put any commments in the Rnw. The doc is complete except for the very last paragraph (the M-step in the example) where I still need to adjust some things.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.
The updated files are all in the philip_edits branch (I had mentioned this in the previous issues but forgot it in this one). In chapter6 folder there are two files with the extension ".Rnw". For the purposes of inserting text/comments, they work just like regular .tex files. I am afraid that if you looked at the master branch you just saw last year's version. Let's talk about this tomorrow.
I've left various comments in the file, I've not quite completed my checks yet. It's quite complicated...
Hey Christian,
I have addressed the comments in chapter 6. Still need to correct the M-step. Please check whether you think the changes I've made are sufficient.
Hey Christian,
I have updated the M-step of chapter 6. I reworked pretty much everything from page 9 onward. I realized that there are some numerical mistakes in the texts which result because I have not yet linked the against knitr, only the tables. Thus, the tables are correct but some numbers in the text are still hard-coded and thus wrong. If you are willing to skip over this for the time being, you can consider this a final draft.
I will link everything to knitr before next Wednesday.
I've put some more comments in Chapter 6 now. The example steps are still not entirely clear to me, I'll have a look at the Collins paper now, and get back to this script then... Maybe it would be nice to see what comes out in the example after a few more iterations? and how to interpret that?
Hey Christian,
seems like we are slowly converging on chapter 6. I incorporated your comments whenever I agreed. I addressed the remaining two inline.
I also linked all the numbers in the text, equations and tables to the R computations. There were still a lot of numerical mistakes in the previous version. The cool thing now is that if you change the data, everything(!) gets automatically updated.
Lastly, I have added a graphical model and some explanation to the mixture model section (6.1).
Let me know what you think.
Hey Christian,
I now also added an information-theoretic proof at the end of ch7 which shows that EM always increases the likelihood. This may or may not help you to get a better feel for what's going on.
In the example of EM, \theta^0 should often be \theta^{(0)} , right?
In (6.9) we have to explain how to compute $$P(X_1=6 | Theta = \theta^{(0)})$$ ! namely http://www.wolframalpha.com/input/?i=(10+choose+6)+(1%2F4(+0.4%5E6+*+0.6%5E4+%2B+(0.65%5E6+*+0.35%5E4)+)+%2B+1%2F2(0.5%5E10))
In general, the MLE for θ j of a categorical is #c_j / n . In our case n = 10. ?? Really? We have 20 observations!
Formally this means... there is a subscript i missing in X=x_i
Why did you have to include the actual .tex code into the full script and not just a link to the _forinclude part? This will be really inconvenient to edit anything...
Hey Christian,
in order to integrate knitr properly into the script, I had to turn the fullscript.tex into fullscript.Rnw. The way knitr works internally is that it first converts Rnw into tex and then compiles the tex. Thus, the tex that you refer to has been automatically generated. All future editing should only take place in the Rnw file. In there, you'll only find include statements (or their knitr equivalents).
OK. I've just checked Chapter 6 and it looks pretty good to me now. I've only changed a few things.
There's one thing I don't understand: how does the conditioning work there?
and, I guess there are some indicator functions missing in (6.15) as well, right?
1) The probability that they occur jointly IN THE DATA SET is the probability that x together with y. x is occurs IN THE DATA SET with probability 1 because it is observed in the data. Thus their joint probability depends only on y|x which is given by the posterior 2) 6.15 is wrong altogether (which is annoying because it's correct in the text around that formula). I'll fix this.
Hey Christian,
after reading through the text again, I threw out 6.15 entirely. It was totally disconnected from the text and the actual update is given in 6.16 (which is now the new 6.15).
Could you announce tomorrow that we have made some minor changes?
Uff,
I found yet another typo. Next time let's just tell them to implement it without giving them any example ;)
it's normal to make lots of typos. I think the example is really the only way to actually understand this...
I don't understand your comment:
1) The probability that they occur jointly IN THE DATA SET is the probability that x together with y. x is occurs IN THE DATA SET with probability 1 because it is observed in the data. Thus their joint probability depends only on y|x which is given by the posterior
It seems to me that it always holds that E[ \ind(Xi=xi,Yi=cj) | Theta = ... ] = Pr[ Xi=xi, Yi=cj | Theta = ...] so why is this the same as Pr[Yi=cj | Xi=xi, Theta = ...] ??
Hmmm,
so the idea is you want to compute the expected number of occurrences of the pair (x,y). You already know the outcome x, thus the only random quantity is y. I guess another way of writing this would be to drop the upper case letters in expectation altogether and instead write
E[1(x,c)|X=x,\Theta=\theta)
Do you like that one better? You could also leave at as is and put the information that X=x in the conditioning context, but then things start to look weird because you'd have x on both sides of the bar. What I mean is this: E(1(X=x,Y=c)|X=x,\Theta=\theta)
Formally, however, I thing that the last one is the most "correct" because then the equality that you were looking at follows immediately.
Why do you even include X=x in the indicator variable then? Why isn't it just E(1(Y=cj) | X=x,\Theta=\theta) ? I think that means: the expected occurences of that coin cj in case we observed x.
This is what it comes down to but from this it does not really become clear that in this case we are counting the expected number of times that they occur together. Doing it like this doesn't really bring out the difference between the sufficient statistics for the binomial per component and the categorical over components. What I am looking for is a good notation to say: when we are computing the sufficient stats for the categorical, we are only interested in how often c_j occurs. When we compute the ss for the binomial distributions we are interested in how often each pair (x,c_j) occurs.
and they are related by the number of occurrences, I guess?
Yes, the expected sufficient statistic is for the categorical is essentically \sum_x 1(x,c_j) for each c_j
Sorry, should be \sum_x E[1(x,c_j)]
I guess you mean \sum_x E[ 1(X=x,Y=cj) ] by this short form above, right?
So, you are claiming that this is equal to no. of occurrences of x in the data * posterior = \sum_{j=1}^m 1(x=x_j) * P[Y=cj | X=x, Theta = ...]
Notation is really not good right now... I'm confused, I think I have to go to bed. But I think I see what you mean...
Yes, that's exactly what I mean. I added a branch called suggestion where I added the alternative notation that I first suggested.
I am supertired as well. In this state there is not much we can do productively, I guess. Have a good night.
Hey Christian,
I have completely reworked chapter 6 which is now only about EM. In doing so, I have fallen in love with the R integration for latex. I have basically implemented the EM algorithm within the doc. I still need to fix a couple of dependencies but the idea is that soon you'll be able to change the toy data set and all results in the doc will automatically be updated to the correct values. As before, please put any commments in the Rnw. The doc is complete except for the very last paragraph (the M-step in the example) where I still need to adjust some things.