1.2i validation updates

fjaviersanchez commented 6 years ago

In this issue I wanted to keep track of the status of the validation of 1.2i. (Note added by Rachel: a summary of PSF ellipticity investigations from this very long thread can be found here.)

1) Exposure checker: @kadrlica found some issues with the sky level gradients. More details at the #desc-dc2-eyeballs channel. Follow up needed for run 2.0.

2) Preliminary comparisons of calexps between imSim KNL and imSim Haswell show perfect agreement (only g-band available for now):

Here I am plotting the difference between the position of the detected objects in 100 imSim KNL visits (src catalogs) and imSim Haswell visits, the detected objects lie in the same exact positions:

test_astrometry_imsim_knl_vs_phosim

I am showing a cutout to demonstrate that the centroids lie in the same positions:

test_imsim_knl_2018_09_04_v_imsim_haswell

And here I am plotting the difference of Kron measured magnitude for the same catalogs. Again the agreement is perfect:

test_magnitudes_imsim_knl_vs_imsim_haswell

3) Preliminary comparisons between imSim and PhoSim show good agreement. PhoSim's sources appear to be brighter given the missing extinction. The comparison's been made in g-band only (for now):

Here I compare the position of PhoSim detected objects with matched imSim detected objects:

test_astrometry_imsim_knl_vs_phosim

And here I compare their Kron measured magnitudes:

test_magnitudes_imsim_knl_vs_phosim

Finally I show a cutout with imSim image and the measured centroids for PhoSim (+) and imSim (x): test_imsim_knl_2018_09_04_v_phosim

4) Preliminary comparisons between imSim and the truth catalog show good agreement.

4.1: i-band (using 100 sensor/visits)

Astrometry residuals for the calibration objects (are we still using some galaxies or are these matching mistakes??)

Photometric residuals for calibration objects only

Photometric residuals for stars (extendedness==0 and matched to a star in the input catalog):

Astrometric residuals for stars (defined as above)

Detection efficiency of stars:

4.2 g-band (300 visits for now)

Photometric residuals for calibration objects only:

Astrometric residuals for calibration objects only:

Photometric residuals for stars (defined as above):

Astrometric residuals for stars:

Detection efficiency for stars:

5) Background power-spectra checks show compatibility between PhoSim and imSim (see DC2 presentation on August 31st)

Should we be worried about these astrometric residuals in g-band? @RobertLuptonTheGood @wmwv @jchiang87 @cwwalter

For g-band I am using visits 159494 and 183811.

rmjarvis commented 6 years ago

Great stuff, Javier. Thanks!

My best guess for the g-band astrometry errors is differential chromatic refraction (DCR). That's stronger in g-band than in redder bands.

The direction of the effect is to push away from zenith, so it's not natively an RA effect. But it's possible that for the 100 visits here, the mean effect is in the RA direction primarily. Especially if the observations are near Dec = -30 and are not symmetrical between positive and negative HA.

RobertLuptonTheGood commented 6 years ago

That data is surprisingly shallow. Is it taken under bad conditions?

What reference catalogue are you using for astrometry? Is there an affect as a function of colour?

jchiang87 commented 6 years ago

It looks like there are just 2 visits in g-band and 1 visit in i-band being considered here. The numbers ~100 and ~300 must refer to sensor-visits.

RobertLuptonTheGood commented 6 years ago

I'd still expect to go a good deal deeper than 20--22 per visit

fjaviersanchez commented 6 years ago

@RobertLuptonTheGood yes, I am surprised too. I checked the background levels and, for the i-band visit that I tested, the mean background is ~1650 while the fiducial sky level is 1150. Using equation 6 in Ivezic et al. 2008 this means a limiting magnitude ~0.2 brighter. The fiducial limiting magnitude according to this link is 23.9 (so in our case it should be ~23.7) and I think we are still far away from that. For the g-band visits, both have ~400 ADU for sky, which is around the fiducial value. The fiducial depth is 24.8 and I would say that this is not the case that we are seeing here... Any ideas @rmjarvis @cwwalter @jchiang87?

fjaviersanchez commented 6 years ago

I'm providing links to the DESCQA runs in these visits for the image 1D histograms and power spectra. Please, ignore the plot titles since I didn't update them: tests for g-band 183811 tests for g-band 159494 tests for i-band 174549

rmjarvis commented 6 years ago

I'm quite sure I don't understand magnitudes or zero points or possibly the difference between completeness magnitudes and limiting magnitudes. But, that said, it looks like we are complete to 22 and probably limiting somewhere around 25 in g-band according to the "Detection efficiency for stars" plot.

Is that inconsistent with the fiducial values?

fjaviersanchez commented 6 years ago

Good eye @rmjarvis. Yes, the g-band limiting magnitude is ~25 (consistent with the fiducial values) for one visit. This is the 1D histogram:

And this is the HEALPix map:

fjaviersanchez commented 6 years ago

Nevermind... I added a vertical line at 24.8 (the fiducial value) and I would say that we are a little bit short (the median of the histogram is 24.46)

rmjarvis commented 6 years ago

I can easily imagine 0.34 magnitudes shallow being due to less than optimal choices for various parameters in the object detection step. I'm more familiar with SExtractor's parameters (which I don't claim to really understand -- just that I have more familiarity with them), but I assume there are analogous parameters to its N sigma above the noise and how many contiguous detected pixels are required. With SExtractor at least, poor choices for these can lead to significantly fewer detections.

So, do we know what values were used here? And whether these can be tweaked to maybe probe slightly fainter objects?

fjaviersanchez commented 6 years ago

Thanks @rmjarvis, that sounds reasonable. That is probably a question for @heather999 and @boutigny.

I believe that the configuration lives here

I think that we are using the default minimum size (1 pixel according to here) and SNR threshold (5.0 according to here).

rmjarvis commented 6 years ago

If I read that right, we require at least 1 pixel at 5 sigma. This won't ever detect a 5-sigma object then, since they always (since the typical PSF spans several pixels) have signal over more than 1 pixel. I suspect the "fiducial limiting magnitude" is in terms of 5-sigma point sources, so we're probably not hitting that.

In DES, we usually set the threshold to around 1.4 sigma and require something like 6 contiguous pixels. This ends up detecting quite a few spurious objects, which we throw out downstream. But better to "detect" some noise fluctuations and remove them later than not detect some real but faint objects.

egawiser commented 6 years ago

Interesting. In MUSYC, which had similar depth to LSST but somewhat worse seeing, we ran Source Extractor in a reasonably common mode where you first convolve the image with the PSF and then detect as objects any single pixel with a significance level of ~1.5, which corresponded to a 5 sigma single pixel pre-convolution. That's a good method for detecting point sources while reducing contamination from detector noise. If no such convolution is being performed, it makes much more sense to require several contiguous pixels at individually modest significance levels like @rmjarvis mentioned for DES. It wasn't clear to me from this discussion if we're doing the convolution or not, but if we are, the single pixel requirement should be much lower than 5 sigma in units of the post-convolution S/N.

rmjarvis commented 6 years ago

Actually, I forgot about the PSF convolution. We do that too in DES. I guess that means a 5-sigma point source would usually have a single pixel ending up at least near 5 sigma. (Maybe only at 5 sigma if the star was centered on the center of a pixel? Not sure.)

Anyway, I guess the low-threshold, multiple contiguous pixels bit is probably more for detecting galaxies then. For stars, you could probably get away with only a little lower than 5 sigma in a single pixel.

fjaviersanchez commented 6 years ago

@jchiang87 made a good point about this in the data access telecon this morning. I didn't check the airmass nor the seeing in those visits (and one of them has pretty high airmass) so, the ~0.4 limiting magnitude difference is possibly due to this. I'll make the calculation including all observing conditions to make sure that it makes sense.

RobertLuptonTheGood commented 6 years ago

A single pixel over threshold in the likelihood image (i.e. the PSF-convolved image) is enough to detect point sources, hence the choice. I agree with you about there being a slight bias towards detecting things centred in a pixel. I do not think that a lower-threshold and multiple pixels helps with this problem. If you want to find more extended sources you should use a larger smoothing kernel. That's an option, but @rearmstr did some HSC studies and concluded that it didn't help. More of a problem is correlated noise (e.g. in coadds), but I don't think that this is a problem here. There is code that adjust thresholds based on sky objects that seems to handle this quite well in e.g. the HSC ultra-deep cosmos data.

fjaviersanchez commented 6 years ago

@rmjarvis the expected depth for the g-band visit above is ~24.51 (159494) so what we get (24.46) is not that far from this as you first said. So the question left is about the stellar density/completeness.

rmjarvis commented 6 years ago

Is 3 magnitudes between limiting magnitude and completeness magnitude reasonable? Sounds like a lot to me. 3 mag brighter than 5 sigma should be 80 sigma (16 x 5). I would have thought we would be pretty complete for point sources at significantly lower S/N than this.

egawiser commented 6 years ago

I think there is indeed a disconnect here worth further discussion. I think @rmjarvis is referring to the plot early in this issue of "Detection efficiency for stars" which starts to roll down at mag~22 and drops below 50% at mag~24, whereas the reported limiting magnitude plot in g-band (first one scrolling up from here) implies ~0.3 mags shallower than a typical visit. The latter is quite reasonable if conditions were poor (even in g-band, that would take a lot of airmass, so sky brightness or sky transparency or seeing seems a more likely culprit). Does anyone have a good sense of the expected curve for detection efficiency of stars as a function of limiting depth? I know that binaries always prevent this from reaching 100%, but that might be the cause of the plateau at 95%, and it's not obvious to me why it should drop so far so early (as Mike noted). Could we check detection efficiency of galaxies, which should be worse, in case it actually appears higher near the limiting mag? It would also be worth checking the number counts of detections to make sure they also peak near the reported limit of 24.5.

fjaviersanchez commented 6 years ago

Apologies to all. I discovered that I was testing the purity of the S/G classifier convolved with the detection efficiency instead of the pure detection efficiency since I was adding the requirement that extendedness==0. Once I remove that requirement the "Detection efficiency" plot looks more reasonable:

fjaviersanchez commented 6 years ago

Below, I am attaching a set of slides with plots similar to those above: imSim_1.2i_validation_09_06_18.pdf

In general, 1.2i looks good. For z and y bands, there are many exposures with the Moon above the horizon and high background levels. For those images, the objects seem to leak into the background (or being over-subtracted), biasing the photometry a bit and lowering the depth with respect to what's expected. From the discussion in #desc-dm-dc2 it seems that this is not a showstopper and may be fine-tuned in later stages.

I will compare these visits to their PhoSim counterparts to see if there's something useful that we can learn from it. I haven't checked galaxy shapes yet.

wmwv commented 6 years ago

For those images, the objects seem to leak into the background

Do you mean something beyond the degradation in the S/N ratio from the increased background level?

fjaviersanchez commented 6 years ago

@wmwv I don't really know. The problem that I am seeing is that the histogram with the magnitude difference between input (reference catalog) and output is not centered at zero for these images in the z and y bands, not even for the objects used for photometric calibration. This bias is small (<~ 20 mmags). What I think is going on is that the background is over-subtracted and the zeropoint is slightly brighter than it should. However, I can be completely wrong since I am no expert on this. Maybe this is a different effect kicking in. Any insights are welcome.

fjaviersanchez commented 5 years ago

I made some plots using r-band visit 181900:

Whisker plot (using base_SdssShape_psf_xx,yy,xy and translated into e1, e2 (e1 parallel to the x axis, e2 parallel to y axis) for PhoSim (1.2p):

Same plot for imSim (1.2i):

Comparison between e1 and e2 for matched objects (detected in both PhoSim and imSim) using HSM e1,e2 (regauss):

Here I plot the difference between imSim and PhoSim's measured e1 and e2 (from HSM_regauss):

And comparing the distribution of the module of the measured ellipticity for all (matched) objects:

I will repeat these plots for other bands. @rmjarvis is there anything else you'd like to see. @jmeyers314 do these whisker plots make sense?

cwwalter commented 5 years ago

@fjaviersanchez It looks like there is an overall scale factor difference in the whisker plots that aren't in the histograms. The scale says that the arrow lengths are the same. Is that correct?

rmandelb commented 5 years ago

@cwwalter - I believe the whisker plot is the PSF ellipticity, while the histograms are for galaxies, so they have rather different information in them.

rmjarvis commented 5 years ago

Great. These look good Javi.

I think it would also be useful to make the same histogram plots for the stars as you did for the galaxies.

I am a little surprised the PhoSim PSF whiskers are so large. The ImSim whiskers look more consistent with what I would expect from DES data, so I think our atmospheric model there is pretty reasonable. Although admittedly, the exposure times are very different, so my intuition might not be right here. (@jmeyers314 may have comments as well about this, since he's looked at more of these than I have, I suspect.)

Another one that would be nice is a size/magnitude diagram. x-axis is magnitude, y-axis is T=Ixx+Iyy. Maybe color code the stars vs galaxies. We should see a nice flat locus for the stars that tips up at the bright end. This latter effect is the first brighter-fatter, and then saturation. I think we should be able to see the B/F effect even on single exposures, but it will be subtle.

fjaviersanchez commented 5 years ago

@cwwalter the scale is the same in both, yes. I didn't expect 1.2p's PSF and 1.2i's PSF to be the same, however, I expected them to be closer to each other. I still have to check more visits.

fjaviersanchez commented 5 years ago

@rmjarvis here's the plot that you suggested:

And below I am subtracting the average T between mag-r=20 and 22 and zooming in to see the brighter-fatter effect better:

rmjarvis commented 5 years ago

Awesome! Those look great!

I'm not exactly sure why there seem to be two saturated branches for the stars. That's a bit odd.

The second one is bang on what I would expect for the B/F effect. If you want to make a pass/fail test from this, you could do something like check that the mean delta T for 16 < m < 17 is at least 1.e-3 more than the mean for 18 < m < 20. But by eye, I'll say it looks right to me.

The rise at faint magnitudes is probably a noise bias in the measurements kicking in. So not unexpected, nor something to worry about. (But a feature of the measurement code, rather than the sims.)

jmeyers314 commented 5 years ago

I am a little surprised the PhoSim PSF whiskers are so large. The ImSim whiskers look more consistent with what I would expect from DES data, so I think our atmospheric model there is pretty reasonable. Although admittedly, the exposure times are very different, so my intuition might not be right here.

Just seeing this... My intuition is that the increased exposure time of DES should be offset somewhat by the decreased aperture, so probably both telescopes see similar amounts of phase variance, and would end up with similar amounts of PSF ellipticity (at least as far as the PSF ellipticity is atmosphere-dominated). I guess this is something I can check fairly quickly with a quick-and-dirty simulation keeping the phase screens the same and varying the aperture. I'll report back in a bit...

jmeyers314 commented 5 years ago

Also, what's the increase in \Delta T with increasing magnitude beyond r~22? Is that what one expects for noise bias in HSM?

rmjarvis commented 5 years ago

Also, what's the increase in \Delta T with increasing magnitude beyond r~22? Is that what one expects for noise bias in HSM?

I often see this in dT vs mag plots. I'm pretty sure it is noise bias in the measurement. I've seen this for example in T_star - T_piff, where the Piff rendering is noiseless (well much less noise at least) and there is a positive bias in the size of stars with both HSM and ngmix.

Now I generally add comparable noise to the Piff rendering to match the data, which helps somewhat, but it doesn't completely eliminate the faint end bias, so there might be something else going on as well. E.g. more non-stars leaking in at the faint end perhaps? That latter hypothesis wouldn't apply here, since Javi knows the true star/galaxy separation, so there shouldn't be any non-stars in the sample.

So probably just the noise bias in this case.

RobertLuptonTheGood commented 5 years ago

There should be no noise or model bias in the psf or aperture fluxes. The only way I know to do this is to get the sky level wrong, and this seems pretty large for that (but I haven't done the numbers). Is it possible to plot the difference between the measured and expected PSF and aperture fluxes? If it's sky errors they should be independent of flux level (modulo BF)

rmjarvis commented 5 years ago

These are sizes, not fluxes.

rmjarvis commented 5 years ago

Well, the x-axis is magnitude, so kind of flux. But the y-axis is size, T = Ixx + Iyy, which is where I think there is some noise bias.

RobertLuptonTheGood commented 5 years ago

Sorry, I finally got a moment to reply to:

The problem that I am seeing is that the histogram with the magnitude difference between input and output is not centered at zero for these images in the z and y bands, not even for the objects used for photometric calibration. This bias is small (<~ 20 mmags). What I think is going on is that the background is over-subtracted and the zeropoint is slightly brighter than it should. However, I can be completely wrong since I am no expert on this. Maybe this is a different effect kicking in. Any insights are welcome.

rmjarvis commented 5 years ago

Ah, ok. I think this was in reference to the very bright sky exposures. These are also the ones where there were tons of "cosmic rays" detected I think. So I'm going to go out on a limb and say there are strange and complicated interplays between high sky levels, noise, and masking lots of things that shouldn't be masked. I suspect these exposures will need a bit of directed attention and TLC in general; 20 mmag bias in the photometry is probably the least of their problems.

jmeyers314 commented 5 years ago

unknown

Here's what I get for LSST vs DECam apertures, and 30s vs 90s of integration time for the PSF ellipticity magnitude distribution (holding all other simulation variables fixed). Almost the same.

rmjarvis commented 5 years ago

Nice. Good to know that my DES intuition will translate relatively well to LSST PSFs then.

Do you have a guess why the PhoSim ones are so much more elliptical than this? I know you played with their PSFs some a while back in developing the atm PSF model for GalSim. Is there some configuration parameter we should dial down maybe?

jmeyers314 commented 5 years ago

One possibility is the screen velocities. Smaller velocities will mean less variation during the exposure time and hence less chance for the PSF to isotropize. Are the velocities recorded by chance? I think we could probably quickly regenerate them for imSim given the seed. I'm not sure about phoSim.

cwwalter commented 5 years ago

They could be in the FITS headers (but I'm not sure)

cwwalter commented 5 years ago

@fjaviersanchez The atmospheric models really are different with no common input parameters. So, for a given visit we don't expect those to be the same. Could you look at something like an average psf size over many visits and see if the distribution looks the same?

You might just be seeing two extremes from the distribution here.

fjaviersanchez commented 5 years ago

@RobertLuptonTheGood, thanks a lot for your response. The bias in the photometry, indeed, seems to be independent of the flux. As @rmjarvis mentioned, the sky levels are very high for these visits (and there's the problem with excessive CR masking). This bias gets worse the higher the background level is:

This is the measured (PSF) magnitude minus input magnitude, as a function of the input magnitude for a visit with ~2300 (the fiducial bkg is ~1700) median sky level in z-band (see that there is a bias present but <10 mmags):

And this is the same plot for a visit with ~3100 median sky level in z-band (it has a pretty perceptible bias):

I am confident that I used PSF mags vs true input magnitude. For visits with lower background levels there's no bias (even in different bands). I agree with @rmjarvis that this is probably related to the issues that he mentioned and that the solution might be paying special attention to these visits.

@RobertLuptonTheGood, for future reprocessing, would it make sense to use different stack settings exclusively for these bright visits to circumvent these issues, or would that cause other problems? (For example, at the time of coadding). Thanks!

fjaviersanchez commented 5 years ago

@cwwalter @jmeyers314 the plot below shows the PSF ellipticity distribution accumulating 695 sensors (4 visits with some sensors missing in each) in r-band and PhoSim's distribution seems to be substantially wider than imSim's:

Reading more visits is kind of slow but I could select one or two rafts each visit if that's useful to speed up the analysis.

cwwalter commented 5 years ago

Thanks @fjaviersanchez! We will need an expert to comment on what we expect/what is reasonable. Is there a DES or HSC plot we could compare with?

rearmstr commented 5 years ago

The imSim values seem pretty small. @fjaviersanchez, what definition of ellipticity do you use?

Here is the corresponding ellipticity plot for HSC (where e is defined as a shear), where I have plotted the ellipticity of all the stars used in PSF modeling on the single visits from the first data release.

fjaviersanchez commented 5 years ago

Thanks @rearmstr! I am using the following definition:

detQ = I_psf_xx*I_psf_yy-I_psf_xy**2
e_psf = (I_psf_xx-I_psf_yy+2j*I_psf_xy)/(I_psf_xx+I_psf_yy+2*detQ**0.5)

Where I_psf_xx/xy/yy = base_SdssShape_psf_xx/xy/yy. Should I be using different values?

rmjarvis commented 5 years ago

Here is the DES p(e) histogram: Smaller than HSC, but larger than imsim. FWIW.

rearmstr commented 5 years ago

@fjaviersanchez, those values are fine. I just wanted to make sure I was comparing the same quantity as there are a couple of different definitions.

I should also add the HSC data is taken in better seeing (FWHM ~0.6") and with longer exposures (300 sec).

LSSTDESC / DC2-production

1.2i validation updates #259