PR for solutions and optional reading material

MicPie commented 5 years ago

I just created my first PR for solutions and optional reading material. Please let me know if you have any suggestions.

suryabhupa commented 5 years ago

Hi MicPie,

Thanks for the PR! I have a few questions --

1) The solutions don't look very cleanly LaTeX'd; this is what I see:

and this:

Namely, I think it would help if you consistently LaTeX'd all the variables and used proper spacing between the equations. I also believe you can provide a bit more context to your second solution before diving into the first equation.

2) Things like log(), and, x, y, should all be LaTeX'd. There's also a stray tag that I see.

When you resubmit this, seeing a screenshot of the output would be useful, and I could check it with my output as well.

suryabhupa commented 5 years ago

Also, our apologies for the very late response -- we were wrapped up in wrapping up our work on the Depth First Learning Fellowship. We'll be more keen on keeping up with these :)

MicPie commented 5 years ago

Hello @suryabhupa ,

thank you for getting back to me.

I fixed the MathJax syntax so that it renders properly (sorry for the mess, this is my first time working with jekyll) and adapted the answers for readability.

Thank you for the great learning material & kind regards Michael

PS: Do I qualify for "Thank you to ..." list (that I just saw)?

suryabhupa commented 5 years ago

Hi MicPie,

Thanks for the fixes -- the math looks cleaner now. Let's keep iterating on the solutions a bit! I think we can clean them even further. In general, I think it would help your proofs if you added more exposition between your steps and reasoning, as it would make the logical jumps easier to follow.

For PMRL 1.31, I don't think the question itself was answered -- you provided the proof for the equality case, but not for the general inequality. Also, since you defined independence between two random variables, I also think it might be useful to define H(x|y), H(x), and I(x, y).

For the next question, I'm not quite sure how you made the jump from stating that the KL divergence is largest when P is maximally different from Q to stating that the cross-entropy is infinite. Perhaps a thing to add is what it means for a probability distribution P to be maximally different from Q? (i.e. there are lots of ways to construct P and Q such that their KL is infinite, but they all have something in common, and I think that's what you mean by "maximally different"). As a nit, I think it'd be great if you replaced the $log$ with $\log$ -- it looks cleaner.

Thanks again! Once we're happy with the changes and merge them in, I'd be more than happy to add you to the "Thanks to..." list for the InfoGAN page :)

MicPie commented 5 years ago

Hi @suryabhupa ,

I tried to incorporate all your suggestions. However, I am not sure if my explanation for P is "maximally different" from Q based on their supports is the right one/is the one you meant.

Thanks for helping me to improve the commit - I think it looks much better now! :-)

suryabhupa commented 5 years ago

Thanks again for continuing to iterate on these solutions! Yes, I think they're in a great place now.

What I was getting at with "maximally different" is that for objects like probability distributions, this metric is a bit vague, and is only grounded with respect to the metric we pick. For example, consider P = Uniform[0, 1] and Q_1 = Uniform[2, 3]. If we measure KL(P || Q_1), we get an unbounded value, so it seems like this is "maximally different".

However, we can pick any disjoint Q, even Q_2 = Uniform(1, 2], which still results in an infinite KL between P and Q_2, even though the distributions are in some sense, "close". However, the Earth Mover's distance would capture the notion that P is "more different" from Q_1 than Q_2. This is why I think it'd suffice to just say something like "KL is maximized when P and Q are disjoint because the log ratio P/Q becomes infinite..." instead of something like "...when P and Q are maximally different".

It's really just a nit though :)

MicPie commented 5 years ago

Hi @suryabhupa ,

thank you for your help! Your explanation is easier to grasp (and helped me view it from an additional angle). I incorporated it and adapted the sentence to make it easier to understand.

PS: One thing I am still very curious about is this issue: https://github.com/depthfirstlearning/depthfirstlearning.com/issues/22 Maybe you can guide me in the right direction?

suryabhupa commented 5 years ago

Hi there! I forgot to follow up on this -- the new edit looks good! I'll take a look at the other issue in a little bit.

depthfirstlearning / depthfirstlearning.com

PR for solutions and optional reading material #21