DoubleGaussian photo-z doesn't quite work as anticipated. This is an extra from an email to Johann Cohen-Tanugi:
One is not completely free with functional forms for p(z). For start,
say if there is a bump a z_true and a lesser bump at z_catastrophic, one
cannot simply associate the same pdf with every galaxy at z_true, one
also needs to associate the same pdf with the correct number of galaxies
at z_catastrophic. In other words, for every galaxy that scatters from
z_true to z_cat, there must be the right number of galaxies scattering
from z_cat to z_true to make those probabilities meaningful. Moreover,
if I take realizations of p(z_true) for a large number of galaxies
around z_true and add them, I need to correctly enter the central limit
theorem, where the total likelihood collapses into a Gaussian around
z_true (most codes will, in effect, rely on this). I'm trying to write
down some math to formalize these requirements.
If you do your photo-zs with full sim, i.e. generated ugrizy from
z_true, fit it and get p(z_true), then these requirements will be
satisfied automatically (because ugrizy generated from z_catastrophic
will be presumably very similar). But if you try to cheat and go
straight from z_true -> p(z), which we are doing here, you need to be
careful. I tried to do this with my DoubleGaussian photo-zs toy model,
see these slides
but I'm not sure if this is actually correct. In fact one, can do a
couple of self consistency tests. One, implemented in
./validate_fastcat/check_pz_sanity.py
implements the following: if p(z) indeed describes the proper, true
p(z), and you take on galaxy, then cumulative p(z_true) should be a
random number between 0 and 1. So I just calculate this for all galaxies
and plot a histogram. And in fact, my DoubleGauss fails this test (but
normal Gaussian passes it). I think you should make sure your code
passes this. In fact, given that you have a rather limited number of
p(z) shapes (ie. many fewer than we have galaxies), I suggest the following algorithm:
take a random p(z) from the library and draw a z from it. Then, find
a galaxy at this z_true (+/- epsilon z) and associate this pdf with that
particular galaxy.
`
This means that many galaxies will have the same p(z), but this is not
an issue. The problem with the above is that towards the end, you will
start running out of galaxies at the right places. Then you can
associate just normal perfect Gaussians with the last 10% or whatever.
DoubleGaussian photo-z doesn't quite work as anticipated. This is an extra from an email to Johann Cohen-Tanugi: