Pass nontrivial tests (extend support for challenging cases)

aimalz commented 7 years ago

The old version of the code supported different physically-motivated test cases that must be re-implemented. EDIT: I will only implement the tests outlined in #55, but there are still some choices that must be made!

[x] unfeatured true n(z)
(sdss interim prior as truth)
[x] fiducial case
(featured truth, constant standard deviations for each galaxy)
[x] high intrinsic scatter
(include (1+z) dependence in standard deviations?)
[x] template-like catastrophic outliers
(constant-ish likelihood components)
[x] training-like catastrophic outliers
(multimodal likelihood components)
[x] template-like interim prior
(multimodal interim prior)
[x] training-like interim prior
(low-z favoring interim prior)

aimalz commented 7 years ago

I'm about halfway to implementing catalog generation of all the tests I outlined here, but I've only run them with 10,000 galaxies, 10 bin PDFs, a lax MCMC convergence criterion, and for only 1000 post-convergence samples per walker (among other limitations). With that said, figures are being added to the paper as they are produced.

drphilmarshall commented 7 years ago

Great! Thanks Alex :-) Good to ship early, to get feedback. My comments on presentation:

Can you please make all lines thicker so they are more visible?
Since we have it, have you tried visualizing the posterior PDF for the bin heights as a low-alpha color band showing the 68 or 95% credible region? It might be too much given all your estimators, but nice to show we are coping with uncertainty properly - and would give you an indication of how many more galaxies you should be using.
Typically you want your main result to stand out, so choose darker colors and more solid linestyle and thicker lines for the hierarchical Bayesian inference (HBI) result, and the truth curve. The stacking estimator should be some intermediate choice (and not yellow!), while the other estimators downplayed even more. They are there mainly for context, right? The primary test is HBI (correct) vs stacking (industry standard) vs truth (no escape).
Did you try the extra panel, showing ratios of estimators to the truth, to allow statements about percentage accuracy?

Let's see what the error band looks like but in general I (and many readers) would not be surprised if the improvement given by HBI is somewhat subtle: one reason why stacking is so popular is because it gives results for n(z) that are close to what the community expects. But we are in the high accuracy cosmology business, so should expect to be studying small effects. Nice work!

aimalz commented 7 years ago

Good points, thanks! I just updated the paper to show the new figures.

Re: colors, I once got some advice to make the colors consistent through a paper, but it seems that what works for the top panel does not work for the bottom panel. I changed the bottom plot back to the rgb-based color scheme I'd been using before, with different line styles and widths to hopefully make it decipherable when printed b&w (and for colorblind readers). Maybe the answer is to use only one color in the top plot.

aimalz commented 7 years ago

I've implemented some rudimentary form of all the test cases except the one that's trying to emulate the form of catastrophic outliers seen with machine learning approaches to photo-zs, which looks like vertical bars in the standard z_spec vs. z_phot scatterplot. I know I'm not thinking about it correctly in terms of a forward model and could use a hand.

aimalz commented 7 years ago

Re: machine learning-like outlier model, @drphilmarshall thinks the only way to do it would be to basically generate the mock data in reverse by starting from a large set of multimodal posteriors and sampling them to produce a very large set of true redshifts, then subsampling those until they match the true n(z). @davidwhogg do you think there's any other option when the outlier population is correlated with the true redshifts?

drphilmarshall commented 7 years ago

To be clear: it's the only way I could think of doing it right then. I would not be surprised if there was a faster way to do this :-) BTW the multimodal posteriors (including their implied interim prior) would be generated with some simple model that we'd need to learn about from real data, but two equal-width Gaussians offset in z and probability mass seems like it might capture the effect.

On Mon, Jun 19, 2017 at 1:44 PM, Alex Malz notifications@github.com wrote:

Re: machine learning-like outlier model, @drphilmarshall https://github.com/drphilmarshall thinks the only way to do it would be to basically generate the mock data in reverse by starting from a large set of multimodal posteriors and sampling them to produce a very large set of true redshifts, then subsampling those until they match the true n(z). @davidwhogg https://github.com/davidwhogg do you think there's any other option when the outlier population is correlated with the true redshifts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aimalz/chippr/issues/25#issuecomment-309568310, or mute the thread https://github.com/notifications/unsubscribe-auth/AArY9zBX766dWRqVyHolM14SiBVkCJFZks5sFt3GgaJpZM4Lg712 .

davidwhogg commented 7 years ago

I don't understand the question. Perhaps a true problem statement would help me.

aimalz commented 7 years ago

Sorry, here's some context: There are real outlier populations that create features in the z_true vs. z_phot plot and impart a signature onto photo-z posteriors that are lines of constant z_phot. Currently, the most compelling test case is that of an outlier population like those seen in template-based methods that maps galaxies in a large range of z_true to a small range in z_phot. Another important test case is that of an outlier population like those seen in machine learning-based methods that maps galaxies at a small range in z_true to a large range of z_phot. Simulating these in a fully self-consistent way (i.e. one that does not cause hierarchical inference to utterly fail) is proving to be difficult.

davidwhogg commented 7 years ago

Still don't quite get it -- is hierarchical inference failing even if you input correct likelihood functions? It really shouldn't; provably even.

aimalz commented 7 years ago

The problem is that I know I'm inputting incorrect likelihood functions. I don't know how to make correct likelihoods for this systematic and am looking for help in doing that.

aimalz commented 7 years ago

This should have been closed by #63; all the cases have now been implemented.

drphilmarshall commented 7 years ago

Great! Well done - let's look at the paper draft on Wednesday together.

On Mon, Jul 31, 2017 at 7:47 PM, Alex Malz notifications@github.com wrote:

This should have been closed by #63 https://github.com/aimalz/chippr/pull/63; all the cases have now been implemented.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aimalz/chippr/issues/25#issuecomment-319252660, or mute the thread https://github.com/notifications/unsubscribe-auth/AArY93jiDKqifYELS4HZo-xKXTeCTg-bks5sTpHKgaJpZM4Lg712 .

aimalz / chippr

Pass nontrivial tests (extend support for challenging cases) #25