Validate single chip output

cwwalter commented 8 years ago

Look at one chip with 5 to 10 visits to validate the output after processing through DM. We should do this for a central and edge chip to see the effect of vignetting.

slosar commented 7 years ago

Hi @cwwalter, @jchiang87

Javier has started to look at some of the output that Jim gave him. Things are basically OK at zero order. What does he need to do call declare this part a victory?

I personally would like to see some a plot of measured magnitude vs input magnitude and then some formal (i.e. beyond simple "looks fine") determination of the effective depth and confirmation that it is what we expect. Does this sound good?

jchiang87 commented 7 years ago

I personally would like to see some a plot of measured magnitude vs input magnitude and then some formal (i.e. beyond simple "looks fine") determination of the effective depth and confirmation that it is what we expect. Does this sound good?

I agree with this sentiment. I think Javier said he already has some code that will ask catsim to return the in-band magnitudes for the input sources. I know I sort of promised that we would make those available, but I personally don't know how to do it. Can we see this code (otherwise I'll just ask someone in the sims group about it.)

fjaviersanchez commented 7 years ago

I don't have this code in particular. I have something similar to query CatSim catalogs given a pointing position and the size and I could try to match them but, I don't know exactly the zero-points used to go from measured fluxes (the stack doesn't give you the magnitude, it gives you the flux) to magnitudes -- my guess is that there should be something in the stack that performs this conversion but, I don't know for sure.

I performed tests using CatSim+GalSim+DM stack in a 1 sq-deg catalog centered at RA=0 deg, DEC=0 deg, and I got the same density of detected sources which in terms of the WL Gold Sample means that you have ~80% of the total simulated Gold-like sources with the default DM stack parameters. That's why I said that it looked fine. I will plot the measured fluxes for both and see if they are comparable. I can try to download the same circle and test it through the same process.

Also, I have another request for DC1. Would it be possible to run the DM stack with the HSM extension activated to get the sizes or will it delay a lot the process? (For a single frame measurement it doesn't impact very much the performance, but this is for one chip output in my laptop so, I don't know how things translate to a bigger problem). It's because when I try to get the size from the output, I can't compare well since the stack gives us the PSF convolved moments XX, YY, and XY. When I try to get the input a, b from the catalog, I need either the un-convolved moments, or the PSF used.

jchiang87 commented 7 years ago

You can get the zero point info from calexp via code like this:

calexp = dataref.get('calexp')
zero_point = calexp.getCalib().getFluxMag0()[0]

That will give you the conversion from flux (in ADU) to maggies.

Regarding the HSM extension question: I'm not familiar with that option. Do you know how to enable it? If not, could you post the question to community.lsst.org?

fjaviersanchez commented 7 years ago

Thanks @jchiang87!

I know how to do it for a single frame measurement. I guess it should be similar for processEimage.

First you setup the package

git clone https://github.com/lsst/meas_extensions_shapeHSM.git
setup -r . -j
scons -Q opt=3 -j 4

And then you add the plugins into the measurement config class:

schema = lsst.afw.table.SourceTable.makeMinimalSchema()
config1 = lsst.meas.base.SingleFrameMeasurementConfig()
config1.plugins.names.add('ext_shapeHSM_HsmShapeBj')
config1.plugins.names.add('ext_shapeHSM_HsmShapeLinear')
config1.plugins.names.add('ext_shapeHSM_HsmShapeKsb')
config1.plugins.names.add('ext_shapeHSM_HsmShapeRegauss')
config1.plugins.names.add('ext_shapeHSM_HsmSourceMoments')
config1.plugins.names.add('ext_shapeHSM_HsmPsfMoments')
measure = lsst.meas.base.SingleFrameMeasurementTask(schema=schema, config=config1)

jchiang87 commented 7 years ago

ok. I'll follow up on that. Seems straight-forward.

fjaviersanchez commented 7 years ago

This is the first part of the test. The observed magnitude histogram looks OK.

maghist_r

I still have to compare to the input. I am downloading the data right now.

egawiser commented 7 years ago

That histogram looks ok to me for a single visit. The small tail to high magnitudes is weird for observed magnitudes and is worth asking DM about, but it shouldn't cause us trouble.

fjaviersanchez commented 7 years ago

I think there was a problem with reddening that they solved in CatSim already, that made that tail appear. Is that right @SimonKrughoff?

fjaviersanchez commented 7 years ago

@dkirkby suggested that they also could be "junk" detections (noise fluctuations detected as objects). I will test that possibility as well.

egawiser commented 7 years ago

That should happen - a small fraction of detected sources should be purely noise fluctuations - but then the detected magnitude histogram should still have a reasonably sharp cutoff. Just to double-check, the histogram you plotted is of the magnitudes reported by the DM stack, not the input magnitudes for sources that get detected, right?

fjaviersanchez commented 7 years ago

Yes, that's correct the histogram is of the magnitudes as measured by the DM stack.

dkirkby commented 7 years ago

A source can have a measured magnitude fainter than the nominal SNR threshold if it is smaller than the PSF, e.g., from a noise fluctuation. I believe the default DM source detection allows single-pixel sources, but @fjaviersanchez should confirm.

fjaviersanchez commented 7 years ago

That's correct. In the file processEimage.py from the repository that @jchiang87 put available at NERSC, the minimum size for a source is 1 pixel.

config.charImage.detectAndMeasure.detection.minPixels=1

egawiser commented 7 years ago

OK - that would explain the tail. I don't understand why the DM source detection works that way; the usual approach is to convolve an image with the PSF and then search for significant individual pixels in the convolved image; if you don't convolve, then you usually require many contiguous pixels of modest significance. Perhaps there is a flag stating that these sources appear to be spurious?

slosar commented 7 years ago

Yeah, I agree this is really weird way of doing things -- if I see a 5-sigma spike in one pixel where I know the PSF size is say 4 pixels, that I can be pretty sure this is NOT a real source. But we should follow up some these faint sources (see how they actually look in the data)

fjaviersanchez commented 7 years ago

I was playing with the different flags that come with the stack results, and there is an interesting one to keep track of these faint sources. It turns out that all of these sources have been only detected in one of the 5 exposures of the coadd (not necessarily the same exposure). If I require more than a single detection the magnitude histogram changes to this:

multiple_detections_rhist

The problem is that we lose ~25% of the sources. The magnitude histogram for the single detected sources is the following:

single_detections_rhist

There are more flags that I can try to combine to get rid of some of these faint objects (there are 95 flags in total) but, this is the most effective one (flag number 3 for the r-band). Also, some of this faint sources are detected in masked regions or have interpolated pixels (62 of 317 -- flags 68 or 69 equal to True):

faint_example

In the picture above the input galaxies are blue x's whereas the faint detected objects (psfmag>25.5) are cyan +'s. I didn't put a marker on the input stars.

I guess that at the end of the day we will want also objects that have been only detected once, right?

I am working trying to get the input magnitudes from CatSim. I downloaded the input galaxies and stars but I got less objects than the Twinkles input. I queried StarObj and GalaxySearch2015. @jchiang87: Did you also query for easter-eggs in the catalog or any other objects?

jchiang87 commented 7 years ago

@fjaviersanchez Here are the objects I queried for in making those instance catalogs:

    star_objs = ['msstars', 'bhbstars', 'wdstars', 'rrlystars', 'cepheidstars']
    gal_objs = ['galaxyBulge', 'galaxyDisk']

I ran an earlier version of this code to generate the instance catalogs that were used in the phosim runs and which I put at NERSC. The acceptance cone I used was 2.5 deg radius, which is a lot bigger than the focal plane, so that may be why there are a lot more sources in that file than you may expect.

BTW, I have a script that calculates apparent magnitudes for the sources in an instance catalog. It uses the sims_photUtils package. The code isn't ready for general use, but if you are interested, it is in that same repository. See the compute_apparent_mags.py script in the bin.src directory. I can generate the input magnitudes for the sources in R22_S11 and put them at NERSC if you want.

fjaviersanchez commented 7 years ago

@jchiang87 Thanks for this. I made the query for stars and galaxies in the 2.5 deg radius. I got the same number of stars (212902) but I got less galaxies (16M vs 21M). I will try your code and see if I get the same number of objects. In the meanwhile, I would really appreciate if you can generate the input magnitudes for the sources in R22_S11. I would just make a quick comparison and see how it looks.

jchiang87 commented 7 years ago

@fjaviersanchez I uploaded a file to NERSC with the r band magnitudes for the sources within 0.17 deg of the center of R22_S11:

/global/project/projectdirs/lsst/phosim_deep/feasibility_study/single_raft/phosim_inputs/catsim_mags/R22_S11_r.pkl

This is a pickled pandas data frame:

In [2]: import pickle

In [3]: mags = pickle.load((open('R22_S11_r.pkl')))

In [4]: mags.head()
Out[4]: 
       objectID         ra        dec          r galSimType
0  6.145475e+13  53.096023 -27.438950  27.406412     sersic
1  6.145768e+13  53.092250 -27.449645  28.013324     sersic
2  6.145533e+13  53.119502 -27.453747  27.480317     sersic
3  6.145463e+13  53.139391 -27.558649  27.243323     sersic
4  6.145242e+13  53.012776 -27.500717  26.860429     sersic

The galSimType values are either pointSource for stars or sersic for galaxies. Here are the histograms of the magnitudes measured by the Stack in deepCoadd-results/r/0/1,1 and the magnitudes in that pickle file from the catsim code: r22_s11_mag_hists

fjaviersanchez commented 7 years ago

I have some questions:

From the histogram it looks like you have more detected sources than catsim sources but, in reality is the other way around. Are you matching them somehow or making any cuts on either catsim and/or the catalog?
If so, which cuts/matching procedure you use?

jchiang87 commented 7 years ago

I'm not making any cuts. Here is the code in full for the plot

import pickle
import numpy as np
import matplotlib.pyplot as plt
import astropy.io.fits as fits
plt.ion()

fluxmag0 = 63095734448.0194

coadd_fluxes = fits.open('meas-r-0-1,1.fits')

mags = -2.5*np.log10(coadd_fluxes[1].data.field('base_PsfFlux_Flux')/fluxmag0)

plt.hist(mags, bins=50, range=(14, 25), histtype='step', label='coadd')

truth = pickle.load(open('R22_S11_r.pkl'))
plt.hist(truth['r'].values, bins=50, range=(14, 25), histtype='step',
         label='catsim')
plt.yscale('log')
plt.legend(loc=2)
plt.savefig('R22_S11_mag_hists.png')

jchiang87 commented 7 years ago

Actually, range option in the histogramming of the truth values is making implicit cuts:

In [11]: min(truth['r'].values), max(truth['r'].values)
Out[11]: (9.5252954696519314, 41.125372614774761)

egawiser commented 7 years ago

OK - those cuts don't seem to be affecting things, but they're good to know about. I'm confused by the hard cut at r>25 in both the input (catsim) and output (coadd) - why is that occurring? My hypothesis for the excess of r~16.5 sources is that these are overestimated magnitudes assigned to saturated stars - but that's just a guess.

fjaviersanchez commented 7 years ago

The hard cut at 25 is because the histogram ranges are between 14 and 25.

jchiang87 commented 7 years ago

Here's a better plot with histogram range from 7 to 40: r22_s11_mag_hists I'm fairly certain that the bump in the coadd distribution at r~16.5 comprises artifacts from incompletely subtracted saturation wings for bright stars. Because of the rotational dithering, these artifacts have different locations in the different exposures. See https://github.com/DarkEnergyScienceCollaboration/SSim_DC1_Roadmap/issues/7#issuecomment-241904808

egawiser commented 7 years ago

Sorry - should've seen that range parameter, but this is a better plot - thank you! Now I can see the presence of r<15 input stars, and I can believe those are causing the r~16.5 artifacts. I'm guessing that the bimodality in CatSim inputs is caused by galaxies not being assigned at r>~28 but the stellar model going arbitrarily dim; nothing that dim should affect us here, so that's just a curiosity. Looks good overall!

fjaviersanchez commented 7 years ago

Are there any other tests that you want to see in this?

slosar commented 7 years ago

Can we see the scatter plot of coadd-catsim for coexistent objects? But otherwise this looks pretty sane! Thansk @fjaviersanchez for doing this sanity checks!

fjaviersanchez commented 7 years ago

Sure!

mag_scatter

The x-axis is the input magnitude, the y-axis is the measured magnitude. There is an offset between these two. The offset is small enough to not be very perceptible on the projected magnitude histograms. This plot shows all the objects detected. I matched by position, using a kdtree. I am associating each detected source to the closest input source from CatSim. In general this should work but, there will be some mismatched sources (junk and some blended sources).

slosar commented 7 years ago

OK, this is great for now. I think for real DC1 we should really try to understand all the features of this plot, but for now I think we can go ahead. Perphapse repeate the same sanity plots for full focal plane output and then we are ready for DC1. Ready to closet this issue?

egawiser commented 7 years ago

I'm happy enough. @fjaviersanchez if it's easy, you could re-make that plot by matching each detected source to the brightest input source within 1" (or 0.5"). That should clean things up.

fjaviersanchez commented 7 years ago

I tried to make it for the whole focal plane but, I don't have the magnitudes for the whole catalog yet. I repeated the plot for the central chip following the suggestion by @egawiser. It cleans up things a bit.

mag_scatter_one_chip

And I guess that with this we can close this issue.

cwwalter commented 7 years ago

@fjaviersanchez Please discuss full focal plane validation in Issue #12.

LSSTDESC / SSim_DC1

Validate single chip output #11