ericmjl / bayesian-analysis-recipes

A collection of Bayesian data analysis recipes using PyMC3
https://ericmjl.github.io/bayesian-analysis-recipes
MIT License
554 stars 111 forks source link

Roadmap of where to start with this repo? #1

Closed IanQS closed 6 years ago

IanQS commented 6 years ago

As in title -

1) is there a specific notebook I should start with or are they all pretty much independent and self-contained.

2) Also, any recommendations for further reading?

3) In multi-sample it seems that one of the cells error-ed out?

ericmjl commented 6 years ago

@IanQS here's my responses:

is there a specific notebook I should start with or are they all pretty much independent and self-contained.

The latter is correct. I'd also love to get feedback from you on what you're finding easy and what you're finding difficult with the repository. If you could post them as individual GitHub issues, I'd really appreciate it.

Also, any recommendations for further reading?

In multi-sample it seems that one of the cells error-ed out?

Yeah, still debugging.


Pardon me if the answers appear curt; I'm due to leave my desk in just a minute or so, so I just put up bullet pointers. Happy to discuss more via thread if needed.

ericmjl commented 6 years ago

Feel free to close issue whenever.

IanQS commented 6 years ago

The latter is correct. I'd also love to get feedback from you on what you're finding easy and what you're finding difficult with the repository. If you could post them as individual GitHub issues, I'd really appreciate it.

For sure! This is really interesting to me - I just watched your talk and found it both informative and interesting enough that it made me want to explore more. I'll definitely post feedback.

Allen Downey's Think Bayes.

PyCon, SciPy and PyData talks are super useful. I learned from there as I was collating these examples together.

PyMC3 docs.

What do you think about Think Bayes vs Probabilistic-Programming-and-Bayesian-Methods-for-Hackers(recommended on the pymc3 github page)

Feel free to close issue whenever

I'll close it after I get a response (or now if you'd prefer I don't want to impose or come across as too demanding :) )

Thank you once again for putting all of this up online btw! I really appreciate it

IanQS commented 6 years ago

I will say that it seems most of these things require a GPU? Or maybe they were programmed with a GPU in mind because they seem to error out on my machine and I think I have the latest version of theano

ericmjl commented 6 years ago

Hmm... let's try to debug this.

(1) What's the error message you're receiving? Can you paste it here?

(2) Can you list your environment? At the terminal, if you do: conda env list, it'll list the packages inside your conda environment.

ericmjl commented 6 years ago

What do you think about Think Bayes vs Probabilistic-Programming-and-Bayesian-Methods-for-Hackers(recommended on the pymc3 github page)

Also a good book! I think I'll create a "further reading" section in the README.md file.

IanQS commented 6 years ago

Also a good book! I think I'll create a "further reading" section in the README.md file

Definitely a good idea! In the spirit of helping a complete scrub in this subfield, I was wondering if you could tell me why I'd choose a bayesian framework over a standard one. I only ask this because I've already looked through the starting pages of both books as well as the PyMC3 docs and none of them really give me a 'this is when you'd want to use this' idea, yknow?

Do I use this because it helps me more explicitly include my priors into the process? do I use it because I don't have much data and thus I want to be as informative as possible towards my model? I know one selling point of this is that it seems to output probabilities but don't quite a few models already output probabilities (albeit about their final output and not over their parameters but is outputting things over my parameters that important?)

I'm so sorry for making my problems your problems by posing you this but I feel like I don't understand the motivation behind it (although I will also try to spend moretime reading to get the intuition)

ericmjl commented 6 years ago

Definitely a good idea! In the spirit of helping a complete scrub in this subfield, I was wondering if you could tell me why I'd choose a bayesian framework over a standard one.

I'll assume the "standard" framework is the "frequentist" framework - what the vast majority of us were taught that in our standard 2nd year undergraduate stats class. The only thing I took away from that course was the t-test, and horrendously misapplied it everywhere. Stats is a funny beast!

Do I use this because it helps me more explicitly include my priors into the process? do I use it because I don't have much data and thus I want to be as informative as possible towards my model?

Yes and yes! One thing about the Bayesian framework is that you can express what you believe about the state of affairs about the world explicitly and quantitatively having not seen the data, and rigorously update those beliefs having seen the data.

Now, one common objection to Bayesian statistics runs like this: "You can specify your priors any way you like to get the answer you want!" To which my response is this: If you disagree with my priors, bring your bullet point list of objections to the table, and provide an alternative prior with a bullet point list of supporting points. Let's have a rational debate about which priors are more suited to the problem at hand. And if we can't agree a priori before seeing updated beliefs, then let's just compare our updated beliefs using each prior, and debate the updated beliefs.

Of course, there's some people in the world for which no amount of argumentation will convince them that this process is better than using the "defaults" of frequentism. To them, the words, "I beseech you, in the bowels of Christ, think it possible that you may be mistaken" carries no meaning. No need to debate with them :smile:.

Now, where else would you want to use Bayesian methods? It'd be in a place where quantifying the uncertainty can help with decision-making. For example, the uncertainty in the number of forecasted projects at various stages in a company's portfolio can help with resource allocation. Alternatively, knowing with certainty that engineering some mutation in a protein will be useless can help steer us away from even trying that mutation.

I can tell you where you probably can safely discard the use of Bayesian methods - in real-time streaming systems is one. Bayesian methods are going to be super slow compared to the use of simple summary statistics, due to the amount of computation needed. Note here, though - this is merely a practical reason not to use Bayesian methods, and not a philosophical one.

I just remembered a good video I watched that compares the traditional T-test to Bayesian methods, by John Kruschke. I'll add that to the further reading list, but here it is for convenience too.

I'm so sorry for making my problems your problems by posing you this but I feel like I don't understand the motivation behind it (although I will also try to spend moretime reading to get the intuition)

No worries, happy to help. In fact, discussing this with you has inspired some ideas for me for a blog post. I'd also encourage you to blog about what you're learning along the way and share it, I'd love to read about your learning journey too!