CamDavidsonPilon / Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/
MIT License
26.75k stars 7.87k forks source link

Convergence issues in Chapter 2? Figs don't SEEM to agree with text. #228

Open xguse opened 10 years ago

xguse commented 10 years ago

I am wondering if this is an inconsistency or just my own confusion on the subject matter, but this does not look to me like the

distribution puts most weight near the true value of p_A

Further, if I run the model over and over, sometimes it does look this way. Is this just a matter of time-constraints on the simulation? Or is this actually "close enough". Again, I am new to this kind of modeling.

It might be worth mentioning that while the "true" value may land in the simulated distribution's tail, that on the full problem space, this is still "very close"?

If nothing else, I would like clarification for my own illumination!

Thanks to everyone for this amazing resource!

Gus

CamDavidsonPilon commented 10 years ago

Hi @xguse, this is a common problem with writing a book that has interactive statistics exercises. When I first did the simulation, most of the probability mass was near the true value. But if you run the simulation again, new data is created that may change the inference.

Simple example: if we flip an unbiased coin 100 times, I might get 50 heads, and I would conclude (and write in the book) 50% chance of heads. Then you flip the coin 100 times, you might get 50, and you might get 40 - in the former case you would conclude 40% chance of heads.

Does this make sense?

xguse commented 10 years ago

Yes of course, "runs" are part of real random data. The thing was that if I ran it 5 or so times it would only not be in the tails of the hists in one of the iterations, so I was just curious "HOW" close to 'true' is considered "close enough" in the Bayesian framework?

-- W. Augustine Dunn III, Ph.D Postdoctoral Research Associate Yale University Ecology & Evolutionary Biology http://about.me/wadunn http://about.me/wadunn wadunn83@gmail.com wadunn83@gmail.com gus.dunn@yale.edu gus.dunn@yale.edu What follows is my public key for sending me SECURED email using a plugin like http://www.mailvelope.com/ http://www.mailvelope.com/. -----BEGIN PGP PUBLIC KEY BLOCK----- Version: OpenPGP.js v.1.20130712 Comment: http://openpgpjs.org http://openpgpjs.org xsBNBFIFl8QBCACn/Arb96LzlcetuPj2yXMqMVUmJueUJ8/alzQCpAZx5JnU 7R+ajxOgmiV4JxMpOT/9nHJCwDudDrqgoTms1AjJVWoOZTBCpRGq9MpFr0PF 5eue2ioAdoNZyehFYhh7oN6PvBHxMC9y2Xesiza9d4nartZhES2coBJh7FEX E8gOIrQ7c0cWfpapBUmcfYLQtvo1DFK7dLa1zFqFpN/losfVocHu6cEOdEkB T0mtedkjhOWAdjq0w3GxYkeXPcbdfGFNM9kW3k+TbF4M4vWGQVgcxgm2g3vc tpnUAfNqdwXm6x2jZblr/opvONLlvq4dM0fwi/rKWuGZLpWhuytvQZ1XABEB AAHNHUd1cyBEdW5uIDx3YWR1bm44M0BnbWFpbC5jb20+wsBcBBABAgAQBQJS BZfHCRAd0f5tHWjy7QAARnoIAKaaEpaqKW18GLkecTigYD51PnvLBwHqRW/b AjEVblBN3VzAiF/3LGXHRiq7XmDSOj1RdLHBGJeem4493YAvyr2HStxQO/fT gcajnfABu5/wMdpGKIrVj5l6PmwPhyq1+pQLcCVewT/wP/JWrBSV81caWMvl k4M20JMOYMoKWLLasYtYpvQCCy9pkDv5KYHI12bjtXQ/yKdz2ishopPmHfQk dHfsEKJO6Xx6JNStFvSxpFEtxXBLvI/hGgQApyqzWWvWTdceULXgQMaVOUo8 yBU285qPR9Bc49dbZY/XUI5ERCeM1mqhdVSftfyUM78fa7pQRM2YVzYKh6Sl LumC4W0= =p2i3 -----END PGP PUBLIC KEY BLOCK-----

On Wed, Oct 8, 2014 at 11:52 AM, Cameron Davidson-Pilon < notifications@github.com> wrote:

Hi @xguse https://github.com/xguse, this is a common problem with writing a book that has interactive statistics exercises. When I first did the simulation, most of the probability mass was near the true value. But if you run the simulation again, new data is created that may change the inference.

Simple example: if we flip an unbiased coin 100 times, I might get 50 heads, and I would conclude (and write in the book) 50% chance of heads. Then you flip the coin 100 times, you might get 50, and you might get 40 - in the former case you would conclude 40% chance of heads.

Does this make sense?

— Reply to this email directly or view it on GitHub https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/issues/228#issuecomment-58380121 .

CamDavidsonPilon commented 10 years ago

Well, that's difficult to answer as you don't know true in practice. Furthermore, "close enough" is a function of how much data you have: more data means you'll be closer (more often).

CamDavidsonPilon commented 10 years ago

W.r.t my text, I probably should have seen this coming. To correct it, I would say:

If you re-ran the simulation, likely the peak of the posterior is not equal to the true value, but the posterior did put a decent amount of weight on the true value (relative to putting nothing!)