compare accuracy of two (bad) options

davidwhogg commented 8 years ago

Compute p(z) by taking a histogram of maximum-a-posteriori redshifts (best-fit redshifts). Call this p1(z).

Compute p(z) by adding up the posterior pdfs. Call this p2(z)

Compute the KLD between p1(z) and the truth (there are two such KLDs, compute both).

Compute the KLD between p2(z) and the truth (there are two such KLDs, compute both).

Show me p1(z), p2(z), and the truth vs z on one plot.

Label the curves.

Tell me (in a caption) what the KLDs are.

aimalz commented 8 years ago

The righthand plot has this information: llr Here p1(z) is labeled as "stack" (cyan) and p2(z) is labeled as "MAP" (magenta). The legend gives the KLD values (relative to the true value) for each of them in both directions.

davidwhogg commented 8 years ago

What I want is a plot of p1(z) vs z and a plot of p2(z) vs z and the Truth vs z and a caption stating the four KLD values (which are scalars).

aimalz commented 8 years ago

Ah, I misunderstood what you were asking for. I think the top plot here is what you want: truenz The dashed and dashed-dotted lines are the ones you asked for. Indeed there is a bug in the KLD that's calculated after MCMC. (I'm pretty sure I wrote it for different inputs than I gave it, will fix that immediately. EDIT: Nope, that's not the problem. Despite the fact that the KLD is invariant under change of variables, it is sensitive to whether I am taking the difference between N(z) parameters or p(z) parameters, which are the same aside from a scaling by the survey size.)

ixkael commented 8 years ago

Very interesting! Shouldn't the MAP converge to the true N(z)?

aimalz commented 8 years ago

If all the individual zPDFs peaked at their corresponding true values, then the MAP would converge to the true N(z). However, they are noisified such that the MAP z bin is not necessarily the true z bin.

ixkael commented 8 years ago

Thanks for clarifying, this makes sense. Is this effect included in your N(z) model?

aimalz commented 8 years ago

I'm not sure what you mean by my model, but it is described in the paper. Last week, I worked out the likelihood of N(z) given the hyperparameters that go into it (the Dirichlet distribution we talked about), and that effect is included as a Gaussian convolution.

ixkael commented 8 years ago

OK I will look at the paper draft at some point soon.

davidwhogg commented 8 years ago

KLD works on p(z) not N(z) so normalize.

Also why are the numbers different here than in the other issue?

Also, I am concerned that the MAP value looks far worse than the histogram of best-fit values, which seems bad. Or is N(E(z)) the histogram of best-fit values? By the way, that is NOT the right thing to call it I think?

Also, asymptotically, the maximum marginalized likelihood (NOT the MAP but similar) MUST approach the true N(z). If it doesn't, then there is a bug.

ixkael commented 8 years ago

OK so this relates to my previous question: I would also expect the MAP to be closer to the true N(z) if you are using the same likelihood as in your sampler (can you confirm that?).

I agree with Hogg that the max marginalised likelihood should approach the true N(z). In fact, wouldn't that be a strong requirement for this approach?

aimalz commented 8 years ago

The "KLD" in the above figure was mistakenly calculated for an incorrectly normalized p(z). A properly calculated KLD is given in the last figure in https://github.com/aimalz/prob-z/issues/33. The log likelihoods for all curves are also given there, and they're all quite similar to one another (and dominated by the Poisson term in the likelihood). Perhaps this is why the KLD is failing to discriminate between them.

The MAP there has a quality about halfway between that of stacking and that of using the expected value redshifts. It's more surprising to me that the MAP and E(z) values are not more similar than that the MAP isn't as good as E(z). Also, is there something else I should be calling N(z) based on z=E[z]?

I may be misunderstanding what you mean by maximum marginalized likelihood as an estimator of N(z). Can someone clarify?

davidwhogg commented 8 years ago

first: what do you mean by MAP?

aimalz commented 8 years ago

By MAP I mean N(z) as calculated from point estimates of redshift equal to the maximum a posteriori values of the zPDFs, i.e. the j centers of the bins in which p(z_j) is maximized.

ixkael commented 8 years ago

Ah! This is not what I understood with MAP. For me the MAP N(z) is the maximum a posterior value of the full likelihood N(z) function (the one you sample when you do the MCMC).

Of course what you described is also very instructive and should be part of the tests.

davidwhogg commented 8 years ago

Your MCMC code is running on a function which is a sum of a ln_likelihood() function output and a ln_prior() output, is that correct? If so, the ln_likelihood() function is something we can optimize, in principle, right?

davidwhogg commented 8 years ago

Oh wait, move that conversation to the "maximize the marginalized likelihood" issue thread.

On this thread: Can you take the average of the true p(z) and the interim prior on p(z) and compute the KLDs for that? I just want to see that it has the good properties we want.

aimalz commented 8 years ago

Yes, I'm doing MCMC on a posterior probability that includes a likelihood term and a prior probability term.

Since I have the likelihoods hanging around anyway, I now do a maximum likelihood fit as well. I'm using the coarse grid @ixkael proposed because the scipy.optimize functions don't converge to reasonable values. I'll keep working on getting a good maximum likelihood estimate. I have not yet tried specifying a larger number of iterations for the optimizers, and that will obviously help.

The average of the interim prior and the true N(z) are shown in the figure below. Indeed the KLD is lower than for the interim prior alone and the log likelihood is higher than for the interim prior alone. Is there other behavior you're expecting?

truenz

davidwhogg commented 8 years ago

Let's discuss optimizing the likelihood on the maximum likelihood issue thread.

Aside from the issue that the formula for the KL divergence is wrong (it needs a dz in there), This doesn't look obviously wrong at this point.

aimalz / prob-z

compare accuracy of two (bad) options #34