Open aimalz opened 7 years ago
The solution to this one might be to choose a different sampling scheme altogether. @drphilmarshall proposed using an importance sampler as an alternative to emcee. The first step is checking that the final simplex from the MMLE is appropriate.
For this to work, the multivariate Gaussian defined by the inverse hessian output from the optimizer needs to be a reasonable approximation to the posterior PDF - but you have the samples to test this.
On Mon, Jun 19, 2017 at 1:59 PM, Alex Malz notifications@github.com wrote:
The solution to this one might be to choose a different sampling scheme altogether. @drphilmarshall https://github.com/drphilmarshall proposed using an importance sampler as an alternative to emcee. The first step is checking that the final simplex from the MMLE is appropriate.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aimalz/chippr/issues/28#issuecomment-309572126, or mute the thread https://github.com/notifications/unsubscribe-auth/AArY94FIVS6BtZpQsUhADVbVy4gJKK5Uks5sFuFJgaJpZM4Lg_n1 .
Answer: no, the optimizer's covariance is not similar to that of the MCMC samples. Does this kill the importance sampling idea?
Not necessarily, but it makes it riskier.
Is your plot showing the Laplace approximation Gaussian posterior estimate (from the optimizer inverse hessian) in cyan and the sampled posterior in blue? I will assume so and say:
If you could convince yourself and your readers that the optimizer consistently and reliably returned an inverse hessian matrix that gave a Laplace approximation to the posterior that was always 5 times broader than the true posterior, you could rescale the inverse hessian by a suitable factor (like sqrt(5) or perhaps a bit less) and then use it as a biased distribution for importance sampling. (You don't want your biased distribution to be many times broader than your target posterior.) In each of your tests, what is the ratio of the variance of the Laplace approximation to the variance of the true posterior, in each bin? (You could plot this ratio in the z bins, one line per test).
Your assumptions about the color scheme are correct. Other important information I failed to include: In order to obtain the inverse Hessian, I had to change the optimizer algorithm from Nelder-Mead to BFGS. It claims to not have succeeded due to loss of precision but nonetheless consistently gives the same result for the parameters, the posterior probability, and the inverse Hessian.
I'll add in plotting the variance ratio over all the bins for each test case and get back to you with the results!
Good, thanks! One last thought: I don't know what the status of emcee's
multiprocessing/multithreading support is (or if you are exploiting it if
it exists) but you should be able to use multiprocessing
pretty
effectively when importance sampling, because each sample's importance
evaluation can be done embarrassingly parallel-y.
Unfortunately, the per-bin variance ratios between different test cases are drastically different. It's not even consistently >1. The figures are now updated here.
I'm not using emcee's multithreading right now because it ends up being slower than not using it. There's a fair bit of overhead, so it only pays off when the function evaluation (the posterior probability) is very slow, which apparently isn't chippr's limiting factor. I do parallelize running the different test cases and could probably parallelize some other functions (under the likely wrong assumption that nothing bad happens when you run multiprocessing on an already multithreaded function).
Hmm - looks like the optimizer is not returning a very reliable inverse hessian at all! I guess I am not very surprised. I notice from your figures that the per-bin hyperposterior width does look fairly predictable though, which makes me wonder whether you could make a rough prediction of it using simple error propagation arguments, that you could then use to seed an importance sampling. What if you used constant X% uncertainty to define the diagonal elements of a covariance matrix of your assumed Gaussian biased distribution? I'm not sure if this will work well in the cases where the hyperposterior is very narrow - but then, I'm not sure I understand those yet. Do you?
I'm not sure I agree that the per-bin hyperposterior width is so predictable. Most of what I see is that error bars tend to be larger at low ln[n(z)] values because there aren't many galaxies there, and error bars tend to be smaller at the highest ln[n(z)] values because there are many galaxies there. This behavior depends on knowing the shape of the true ln[n(z)] and could very well change as I scale up the number of galaxies. What other predictions did you confirm?
I was thinking that the dependence of the hyper posterior width on the number of galaxies in the bin could be captured with some simple Poisson error model. You have the observed number of galaxies in the bin, N; this could be used as an input. A scatter plot of hyper posterior width vs sqrt(N) could help guide the design of such a model.
On Jun 26, 2017 10:36, "Alex Malz" notifications@github.com wrote:
I'm not sure I agree that the per-bin hyperposterior width is so predictable. Most of what I see is that error bars tend to be larger at low ln[n(z)] values because there aren't many galaxies there, and error bars tend to be smaller at the highest ln[n(z)] values because there are many galaxies there. This behavior depends on knowing the shape of the true ln[n(z)] and could very well change as I scale up the number of galaxies. What other predictions did you confirm?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aimalz/chippr/issues/28#issuecomment-311128091, or mute the thread https://github.com/notifications/unsubscribe-auth/AArY97M09Z4cNIvdIKsEjROeG_JK4ZOGks5sH-wEgaJpZM4Lg_n1 .
Travis tests are taking a long time with the current demo notebook. I'm going to simplify the tests in the notebook and use separate, local scripts to run more meaningful tests. Similar to #16, I'm going to return to the same branch