CamDavidsonPilon / Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/
MIT License
26.55k stars 7.85k forks source link

Chapter 5 - Implementing optimal solution for Dark Worlds problem #412

Open ysgit opened 5 years ago

ysgit commented 5 years ago

Going through chapter 5 I found it weird that there was no implementation of the optimal solution for the Dark Worlds problem, i.e. find the best guess that minimizes the estimated loss function over the posterior distribution. It also wasn't left as an exercise.

I had a go at doing it myself with this code

import sys, os
from DarkWorldsMetric import get_ref
def main_score_sample(guess, trace_all, mass_small_1=20, mass_small_2=20):
    n_halo = int(np.prod(guess.shape)/2)
    x_true_all = trace_all['halo_positions'][:,0].reshape(1,n_halo)
    y_true_all = trace_all['halo_positions'][:,1].reshape(1,n_halo)
    weights = [trace_all['mass_large'], mass_small_1, mass_small_2]
    x_ref, y_ref = get_ref(x_true_all, y_true_all, weights)
    return main_score([n_halo], x_true_all, y_true_all, x_ref.reshape(1,1), y_ref.reshape(1,1), guess.reshape(1,n_halo*2))

def expected_main_score(guess, all_traces, mass_small_1=20, mass_small_2=20):
    old_stdout = sys.stdout
    try:
        sys.stdout = open(os.devnull, 'w')
        res = list(map(lambda x: main_score_sample(guess, x, mass_small_1, mass_small_2), all_traces))
    finally:
        sys.stdout = old_stdout
    return np.mean(res)

best_guess = sop.fmin(expected_main_score, mean_posterior, args=(all_traces,))

main_score([3], x_true_all, y_true_all, \
            x_ref_all, y_ref_all, best_guess.reshape(1,6))

This solution does indeed lead to a much better score than using the posterior means for the halo positions. However, it's very slow so if anyone has a suggestion to optimize it so it could be included in the book that would be great.

This snippet only works after fixing the main_score function which is currently wrong for multiple halos see #411