Closed jchiang87 closed 4 years ago
To lay the groundwork for the problem this issue attempts to address, I'll re-post a couple of images from the #desc-dc2-agn channel. The first is a gif that blinks between part of a Run2.1i sensor visit (v398414-r R22_S11
) and the same image with the AGNs subtracted:
This shows that the added AGNs are fairly bright and ubiquitous, especially for the fainter galaxies.
Next I posted a somewhat misleading image that blinks between two difference images: the first is the Run2.1i image minus the Run2.1.1.i image, where the latter was simulated without the AGNs, and the second is my version of the Run2.1.1i image minus the AGN-subtracted image (= the Run2.1i image minus an AGN-only simulation):
The reason that this is misleading is that I neglected to set the random seed in my version of the Run2.1.1i image so that the sky backgrounds are not pixel-wise identical and the residuals from the AGN subtraction are hidden in the noise.
Here is gif that blinks between the mosaicked Run2.1i raw image and the difference between the calexp
for the production Run2.1.1i image (i.e., with the correct seed) and the calexp
run on the difference image of Run2.1i minus the AGN-only simulation:
Since the same seed was used in the Run2.1i and Run2.1.1i images, the sky backgrounds match and most of the pixels in the residual image are near zero (the calexps have slightly different fitted background levels).
I generated AGN-only raw images for all CCDs in r-band visit 398414, subtracted them from the Run2.1i images, and then ran processCcd.py
on these AGN-subtracted sensor-visits (hereafter agn_sub
). I also ran processCcd.py
on the corresponding Run2.1.1i data, and found all positional matches within 10 arcsec between the two source catalogs over the entire focal plane.
Unfortunately, the numbers of sources identified as point or extended sources in the agn_sub
data versus the Run2.1.1i
differ substantially:
dataset | point source | extended | total |
---|---|---|---|
agn_sub |
33289 | 295388 | 328677 |
Run2.1.1i |
38810 | 289867 | 328677 |
Looking at the psfFlux
pulls vs psfFlux
shows some puzzling behavior.
For the blue points, I've selected all matched sources that are identified as point sources in both datasets, and for the red, I've similarly selected all sources identified as extended in both as well.
The pull distributions show that most sources actually do lie along the zero-pull line, but the widths of Gaussian functions fitted to the central parts of the distributions, while less than 1, are probably still larger than we would hope for.
Here are corresponding plots for the gaussianFlux
values:
Plotting the correlated distributions of the shapeHSM_e1
and shapeHSM_e2
parameters for the matched extended sources does show a strong correlation, but there are still significant tails off of the main diagonal:
Is there a reason not to blink the difference between raw images of 2.1.1 and 2.1-agn? On the second gif you seem to have opted for blinking the difference in calexp, but first what is the situation of the raws when the same seed is used? Sorry if I missed this .... I presume that it is impossible to have exactly the same simulation as 2.1 when separating the sequence of random numbers in 2.1.1 and agn only, right?
I presume that it is impossible to have exactly the same simulation as 2.1 when separating the sequence of random numbers in 2.1.1 and agn only, right?
That's right. There are two issues: 1) We don't have individual seeds per object, so the random sequences diverge immediately, 2) the AGN-only sim does not have the same B/F effects since there are much fewer e- / pixel without the other objects and sky bg being rendered.
Hi Jim - thanks for sharing these plots of the impact of the subtraction on source counts/fluxes/shapes. I have some questions:
I was wondering what selection criteria you imposed to identify sources before doing the positional matching? In particular, what I'm wondering is if differences could arise due to the many objects that are near the detection limit, versus arising for objects that are robustly detected?
Have you compared the PSF model shapes/sizes? My concern is that oddities in the PSF model photometry could indicate that the PSF models themselves are somehow getting messed up and are confounding the comparison. I'm not sure why that would occur, but the plots of pull versus PSF flux look quite strange, so I thought this might be worth checking.
Can you please remind me of the flux zero point? I'm looking at the plots as a function of flux and trying to mentally re-interpret them in terms of magnitudes.
You know the fluxes of the injected AGN component, so I was wondering about correlating the pull against the injected AGN flux for a given galaxy, and/or against the ratio of injected AGN flux vs. true galaxy flux? That could be useful in understanding failure modes for the subtraction. I'm very curious what is causing the power-law tails in the pull distribution, for example.
Do you have a sense for at what level of injected AGN flux would the injected AGN flux + sky result in significant b/f, whereas the injected AGN flux alone would not? I was wondering about doing some kind of trick like simulating a noise-free sky (i.e., literally lay down the sky level into the image without shooting photons, so as to avoid adding sky noise), simulating the AGN, and then subtracting the originally-added sky to get a better version of an "AGN only" image compared to just simulating the AGN. Is that easily doable? Would it give a higher fidelity subtraction, or are there too many objects where all of sky+galaxy+AGN are needed?
If any of this is non-trivial, don't worry about it, but I thought understanding the above would help us draw some conclusions about the viability of subtraction (not looking promising based on these plots, I admit).
(oops, I somehow ended up deleting my original post. Here it is again:)
Hi Rachel, I can provide a couple of answers right now:
I was wondering what selection criteria you imposed to identify sources before doing the positional matching?
I just made some basic cuts based on the various flags in the source catalog:
'deblend_skipped == False',
'base_PixelFlags_flag_edge == False',
'base_PixelFlags_flag_interpolatedCenter == False',
'base_PixelFlags_flag_saturatedCenter == False',
'base_PixelFlags_flag_crCenter == False',
'base_PixelFlags_flag_bad == False',
'base_PixelFlags_flag_suspectCenter == False',
'ext_shapeHSM_HsmShapeRegauss_flag == False'
Can you please remind me of the flux zero point? I'm looking at the plots as a function of flux and trying to mentally re-interpret them in terms of magnitudes.
For these r-band observations, the zero-point is ~32.17. I can remake the plots in magnitude.
I'll try to address your other questions over the next couple of days.
Hmmm, based on these cuts, you could be digging into the noise floor. And with a 10 arcsec match, I could imagine some genuine mismatches that are driving the oddities in the point sources on the left-hand side of the PSF flux plot. Is it possible to re-check the number of detections and the plots with a criterion like "Gaussian flux S/N>10" and a tighter positional match?
Based on the zero point, it seems that the wonky branches in the PSF flux plot are for objects brighter than 10^4 counts or around 22nd magnitude (when simulated without an AGN), and the sign of the effect is that when subtracting the AGN-only image, the branches correspond to extended sources that are too bright and point sources that are too faint by similar amounts? (hence the approximate mirror images)
Do you know what fraction of the objects change classification from point vs. extended source?
Thanks a lot for this last comment! I somehow got it into my head that I was making a 10 mas match and not the 10 arcsec match I actually did (even though I typed 10 arcsec above). Making the 10 mas as I originally intended, things look more reasonable (though still a bit disappointing):
And here are the point source vs extended numbers:
dataset | point source | extended | total |
---|---|---|---|
agn_sub | 14235 | 46050 | 60285 |
Run2.1.1i | 14333 | 45952 | 60285 |
I'll follow-up with the S/N cut and some of your other suggestions tomorrow.
Thinking about the long tails:
Since the pull plots are made based on processCCD output and thus the calexps, I wonder if for the brighter objects anything is approaching the 100,000 count level? In that case DM is going to cutoff and interpolate the pixels in the CCD and the subtraction process definitely won't work for several reasons.
Is the pixel interpolation mask flag propagated into the catalog information?
Ah... probably
base_PixelFlags_flag_interpolatedCenter == False
would get rid of all of these...
Thanks for the updated plots, @jchiang87 ! That looks more promising. And with these results, some of my previous suggestions (e.g., PSF model tests) are no longer relevant. I think the ones that I'm still most curious about include:
The selection criteria question: if you put in a S/N>10 cut, do you still get a (significant) difference in the numbers of detections in agn_sub
versus Run 2.1.1i?
You know the fluxes of the injected AGN component, so I was wondering about correlating the pull against the injected AGN flux for a given galaxy, and/or against the ratio of injected AGN flux vs. true galaxy flux? That could be useful in understanding failure modes for the subtraction. I'm very curious what is causing the power-law tails in the pull distribution, for example.
The selection criteria question: if you put in a S/N>10 cut, do you still get a (significant) difference in the numbers of detections in agn_sub versus Run 2.1.1i?
Adding the S/N > 10 cut does reduce the disparity in numbers of point sources vs extended somewhat. Here is the corresponding table of detections:
dataset | point source | extended | total |
---|---|---|---|
agn_sub | 13040 | 38299 | 51339 |
Run2.1.1i | 13079 | 38260 | 51339 |
For the point sources, there is now a 0.3% (=2*39/(13040+13079)) disparity versus 0.7% without any S/N cut.
You know the fluxes of the injected AGN component, so I was wondering about correlating the pull against the injected AGN flux for a given galaxy, and/or against the ratio of injected AGN flux vs. true galaxy flux?
This will take a little more work since I need to match the source catalog entries from processCcd.py
against the centroid file and instance catalog entries that have the AGN fluxes and coordinates. This will be forthcoming.
I might just be jumping to conclusions, but are those the numbers after matching? (which I'm inferring from the fact that the total is the same in both datasets... seems an unlikely coincidence that they would match so precisely?) If so, then I think it would be useful to see the numbers after selection criteria but before matching. Rationale:
agn_sub
than Run 2.1.1i, the matching after imposing S/N>10 might hide that problem.Either of those problems would be good to know about. So I'd like to see the numbers pre-matching and post-matching to try to test for (or rule out!) a more complete set of problems.
As a possible mitigation for handling the overly bright AGNs that were added to the centers of galaxies in the Run2.1i data, it's been proposed to simulate the AGNs only, using the realized fluxes in the centroid files, so that the point-like contributions from these objects can be subtracted from the existing Run2.1i images. Discussion on this proposal began in the #desc-dc2-fluxes channel and continued in the #desc-dc2-agn channel.
I'll use this issue to post results from this investigation.