Comparing Strong Priors to the Jeffreys Prior

This came out of some discussion with @ericmjl in the hallway track at SciPy 2018.

I had asked during the tutorial about choosing a highly informative prior that assumes a false belief. The widget below visualizes what happens in such a case. We model the highly informative prior with a Gaussian with a small standard deviation. It is modeled after the similar visualization in notebook 2.

def gaussian_pdf(x, mu, sd):
    """
    The Gaussian probability distribution function. We could import this, 
    but the visualation is smoother if we define the function here.
    """
    return np.exp(-np.power(x - mu, 2) / (2 * np.power(sd, 2)))

def plot_posteriors(p=0.6, N=0, mu=0.5, sd=1):
    np.random.seed(42)
    n_successes = np.random.binomial(N, p)
    x = np.linspace(0.01, 0.99, 100)
    prior1 = gaussian_pdf(x, mu, sd)
    likelihood = x**n_successes*(1-x)**(N-n_successes)
    posterior1 = likelihood * prior1
    posterior1 /= np.max(posterior1)  # so that peak always at 1
    plt.plot(x, posterior1, label='Gaussian prior')
    jp = np.sqrt(x*(1-x))**(-1)  # Jeffreys prior
    posterior2 = likelihood*jp  # w/ Jeffreys prior
    posterior2 /= np.max(posterior2)  # so that peak always at 1 (not quite correct to do; see below)
    plt.plot(x, posterior2, label='Jeffreys prior')
    plt.legend()
    plt.show()

The visualization can be created with

interact(plot_posteriors, p=(0, 1, 0.01), N=(0, 100), mu=(0, 1, 0.1), sd=(0.01, 1, 0.01));

I also wrote up a bit of explanation:

Here we are parameterizing the Gaussian pdf by (no surprise) the mean (mu) and standard deviation (sd). We can think of the standard deviation here as a measure of certainty in our prior belief that the probability of getting heads is the mean of the Gaussian.

Interesting things to try:

Set the mean (mu) to 0.2 and the standard deviation (sd) to 0.01. This is specifying a prior that we have a strong belief that the probability of flipping a heads is 0.2 (which is wrong assuming that p is still set to 0.6). Now change the number of trials (N) to 100. What happens?
Now slowly move the standard deviation (sd) to 1. What happens now? Why does the posterior corresponding to the Gassian prior converge to the posterior from the Jeffreys prior?

ericmjl / bayesian-stats-modelling-tutorial

Comparing Strong Priors to the Jeffreys Prior #29