Validate model for deblending use-case

EiffL commented 5 years ago

We want to compare the performance of a deblender, for instance scarlet, on parametric light profiles and light profiles generated using our model, hopefully demonstrating that the results from the generative model are worse/closer to using real galaxy models. This can be quantified for instance by looking at the residuals.

We can follow a procedure similar to section 3.2.1 in https://arxiv.org/pdf/1802.10157.pdf which uses the cosmos sample to look at performance for individual galaxies.
We could play the same game on blended scenes

EiffL commented 5 years ago

Here are some cool looking first results using scarlet (Peter and Fred have done an awesome work at making it super easy to use). This is for a real cosmos galaxy: For the parametric fit to that galaxy: And for a galaxy generated from the generative model (conditioned on same size and magnitude as above): Images drawn at native cosmos resolution and noise (and looks like I screwed up with the noise padding, but the idea is there).

So, it looks good :-) but that's very much what one would expect ¯_(ツ)_/¯. The scarlet paper looks at the correlation between recovered and true image which gives them this figure: If we were to do a similar test with our mock galaxies and parametric galaxies, we would get a similar plot, with the parametric galaxies pretty much at 1, and then it would be interesting to see where the mock galaxies end up, but that wouldn't be super instructive.

We could also build blended scenes with real, parametric, or mock galaxies, and see how well scarlet recovers flux for instance between the 3 different sets.

That also leads to the question, do we want to test these things for LSST observing conditions ? or stick to COSMOS resolution ?

rmandelb commented 5 years ago

This is very cool.

A few comments:

Regarding this:

If we were to do a similar test with our mock galaxies and parametric galaxies, we would get a similar plot, with the parametric galaxies pretty much at 1, and then it would be interesting to see where the mock galaxies end up, but that wouldn't be super instructive.

I kind of feel like we wouldn't gain anything from it that tells us about the impact on science, so even though it results in some numbers/plots, it isn't actually more informative than e.g. the residual plots you showed above (which are similar in the sense of giving a qualitative answer without telling you about scientific impact). Though I guess the histogram would at least tell you more about the statistics of the sample overall, i.e., surely for some of the mock galaxies the result will be near 1, and it's a question of how often is/isn't that the case... so maybe that actually is useful in some sense?

The blended scene idea sounds interesting but as usual the devil is in the details. For example: even if you pick a very limited/constrained scenario in terms of flux ratio and separation, what exactly do you learn if the flux is/isn't recovered well? For large-scale structure statistics, things like photo-z may be more interesting than just flux recovery, but these are currently monochromatic models, so we can't say anything about that. Hmmm.
I think we would benefit from testing this at a ground-based resolution as well. For example, if we were to show that with 0.7 arcsec seeing, these details don't matter as much, that's kind of interesting...

EiffL commented 5 years ago

So, I went ahead and interfaced with David's WLDeblending package and the BlendingToolkit developed by Sowmya. These are the very first results I'm getting. This is scarlet deblending with standard bulge+disk model (if I'm not wrong, these blends are artificial but the observing conditions and galaxies come from some DC1 era catalogs):

And now with generative model: By eye on a single example at LSST resolution, you can see that there are some unsightly residuals when using the generative model whereas scarlet works perfectly on sersic profiles, pretty nice :-) I have also seen examples where scarlet fails (as in some objects remain blended) with the generative model and works fine with normal sersic.

In order to do this, I have trained a generative model conditioned on redshift + bulgefit parameters from the COSMOS catalog. So now I can draw galaxy images for any catalog describing galaxies in terms of a bulge and disk component. Here the galaxy image is drawned in the i band, and then the flux in each band is set to the flux provided by the extragalactic catalog, which is how I can make pretty colour pictures :-)

I agree with your comments above that it will be difficult to get some scientifically relevant deblending results for this paper, but maybe we can restrict our scope to demonstrating that the generative model can be used in the same situations as usual parametric profiles (at no extra cost), but leads to measurably different results (likely to be closer to reality) and will therefore be a valuable tools for further studies of the impact of blending. It's pretty neat (and problematic) that even at ground based seeing the impact of morphology can be seen like this.

I saw a talk by Sownya a couple of weeks ago where she was showing us some results using a neural network to detect deblending failures (from undetected or mis-identified sources), but the model was trained using parametric models, so pretty much any residual in the scarlet deblending was an indication that the detection step was wrong. If instead scarlet is run on simulations based on this generative model, there can be residuals despite correct detection. The point being that this work will for instance be useful to Pat and Sownya in their deblending work.

rmandelb commented 5 years ago

That is very cool! I agree with your take on how this could be described in the paper (and how it could be useful to others).

McWilliamsCenter / deep_galaxy_models

Validate model for deblending use-case #5