LSSTDESC / DC2-production

Configuration, production, validation specifications and tools for the DC2 Data Set.
BSD 3-Clause "New" or "Revised" License
11 stars 7 forks source link

Run1.2p v4 #323

Closed plaszczy closed 4 years ago

plaszczy commented 5 years ago

Here are 3 features I observe on the newly IN2P3 processed 1.2p DPDD catalog (v4). 1/ all ps_Flux values are Nans's (although there are some psFlux_flag==True) 2/ same for blendedness (all NaNs) 3/ the toast is still there. Here is the density plot after asking for good=true & clean==True & extendedness>0.9 density_qual

Photometry looks weird too. Here is the map of teh avarage "u" mag. mag_u_cmodel

and for "i" mag_i_cmodel

now cut below i<24: mag_i_cmodel_below24

Something happens on the borders. food for thought.

johannct commented 5 years ago

We go deeper in the DDF (normal) and at the border of the footprint (not understood I believe). In the last plot, the border seems ok no? You have on average brighter sources where you spend less time during the survey. The way it is precisely geometrically defined is a bit baffling to me though.

plaszczy commented 5 years ago

why would deep change the (mean) mags?

plaszczy commented 5 years ago

in the last plot there is a bluish inversion. seems to indicate a population (noise) that lives above i>24

plaszczy commented 5 years ago

ok I see your point, if you get deeper you shift the mean (not sure it explains the blue deficit for i<24...). But indeed the main point is why is the frontier deep? (as a corollary should it be dropped?)

egawiser commented 5 years ago

Nice analysis and good questions! I suggest plotting the areal number density of detected objects versus magnitude in central, DDF, border and "blue inversion" regions. That should illustrate where the mean mag is shifting due to incompleteness (not detecting objects that would normally be detected), greater depth in DDF resulting in greater completeness at faint magnitudes, and/or the addition of a bunch of dim "objects" due to detecting noise.

wmwv commented 5 years ago

@plaszczy Thank you! Yes, there's something wrong with the psFlux and flux columns in general in the Run 1.2p v4 catalogs. I think this is an error in the generation of the Object Table -- likely a bookkeeping error related to the switch from _flux->_instFlux (which we thought we had handle correctly, but perhaps not). @yymao and I are investigating.

wmwv commented 5 years ago

With regardes to the edge issues, @plaszczy Can you overlay the tract boundaries following, e.g., the skymap tutorial at: https://github.com/LSSTDESC/DC2-analysis/blob/master/tutorials/dm_butler_skymap.ipynb

I hypothesize that the edge falls in the overlap region between tracts. Sources marked isPrimary in one tract should not show up as isPrimary objects in the other tract. So in a region of overlap, there will be a preference for things near the noise threshold to be retained only in this tract. That explains the red rim of the toast. I don't have a fully coherent explanation for the blue rim of i<24 mag, although I suspect it's related.

plaszczy commented 5 years ago

here is the SNR map snr_i_cmodel

I think it makes sense: on the boundaries the region for background subtraction (probably fixed) is smaller and one get more noise for high mag galaxies( which explains both the density and mag maps). Any way to improve on that in the stack (as a sliding window for these patches)? What is still puzzling is the hot extreme region followed by the cool one for i<24.

Any idea why blendedness disappeared?

wmwv commented 5 years ago

@plaszczy Thanks to @yymao, I've fixed the error about the various flux values and updated the DPDD Parquet file(s) on Cori in

/global/projecta/projectdirs/lsst/global/in2p3/Run1.2p/object_catalog_v4/

For those following along. The full Run 1.2p is in

/global/projecta/projectdirs/lsst/global/in2p3/Run1.2p/object_catalog_v4/dpdd_object_run1.2.parquet

and there are also per-tract versions for lighter-weight use.

(the previous ones are in /global/projecta/projectdirs/lsst/global/in2p3/Run1.2p/object_catalog_v4/old)

wmwv commented 5 years ago

@plaszczy I've flagged the blendedness issue https://github.com/LSSTDESC/gcr-catalogs/issues/262 It's completely straightforward, but will take a bit of bookkeeping thinking to track some different versions.

plaszczy commented 5 years ago

here is (more or less) the tracts outline: density_tract

wmwv commented 5 years ago

@jchiang87 Can you remind us of the definitions of the DC2 Run 1.2p size?

plaszczy commented 5 years ago

btw, the number of patches changed. here is before/now:


|tract|#patches|    |tract|#patches|
+-----+--------+    +-----+--------+
| 4429|      42|    | 4429|      37|
| 4430|      56|    | 4430|      49|
| 4431|      56|    | 4431|      49|
| 4432|      56|    | 4432|      48|
| 4433|       7|    | 4433|       6|
| 4636|      16|    | 4636|      11|
| 4637|      64|    | 4637|      49|
| 4638|      64|    | 4638|      49|
| 4639|      64|    | 4639|      49|
| 4640|      48|    | 4640|      35|
| 4848|      48|    | 4848|      35|
| 4849|      64|    | 4849|      49|
| 4850|      64|    | 4850|      49|
| 4851|      64|    | 4851|      49|
| 4852|      16|    | 4852|      13|
| 5062|      17|    | 5062|      12|
| 5063|      56|    | 5063|      45|
| 5064|      56|    | 5064|      42|
| 5065|      56|    | 5065|      42|
| 5066|      35|    | 5066|      31|
+-----+--------+    +-----+--------+```
After disucssion with @johannct this is de to the fact that the patch size changed: there are now 7x7=49 patches/tract. It would be nice to have the updated plot.
johannct commented 5 years ago

You mean https://lsstc.slack.com/files/U2NKBSYT1/FEYUPUPKL/download.png

johannct commented 5 years ago

@jchiang87 Can you remind us of the definitions of the DC2 Run 1.2p size?

1.2 is 5°x5° WFD and 1.1°x1.1° DDF approximately

wmwv commented 5 years ago

I mean the exact definition of the boundaries as used in the codes that determine how to fill images with the catalogs. I would like to overlay those on the Object Table RA, Dec regions.

jchiang87 commented 5 years ago

Here's the google doc with the DC2 Run1.1/2 specs:

https://docs.google.com/document/d/1aQOPL9smeDlhtlwDrp39Zuu2q8DKivDaHLQX3_omwOI/edit

wmwv commented 5 years ago
Location RA (degrees) Dec (degrees)
Center 55.064 -29.783
North-East Corner 57.87 -27.25
North-West Corner 52.25 -27.25
South-West Corner 52.11 -32.25
South-East Corner 58.02 -32.25
dc2_run1x_region = [[57.87, -27.25], [52.25, -27.25], [52.11, -32.25], [58.02, -32.25]]
wmwv commented 5 years ago

I confirm that the "toast" is from the boundary of the defined DC2 Run1.x region. The attached figure shows the boundary of the DC2 Run 1.x region in red (as defined in the DC2 preparation document kindly posted by @jchiang87) overlaid on a density plot of objects in RA, Dec.

dc2_run1 2p_ra_dec

plaszczy commented 5 years ago

@wmwv it would then avoid confusion removing these borders from the final object catalog

plaszczy commented 5 years ago

On the UDDF: why does it look looks so large and circular ?. Removing the specs: snr_uddf

why is'nt the density much higher there?

egawiser commented 5 years ago

@plaszczy good point - I was wondering about that too. I think that what's happening is that the upper corner is the location of a simulated DDF in the OpSim run. Which means that there are many more nominal visits. I'm guessing that we dithered them and then simulated images for all sensors that are within the Run 1.2 region (instead of just the Run 1.2 DDF region). The latter might seem preferable and would produce something closer to the square as having the extra depth; that will be the region with sprinkled time domain objects added to the catalogs anyhow. However, if we only did that, the region just outside the square DDF would be shallower than the rest of the main (WFD) Run 1.2 region. What we ideally would do would be to subselect a set of the visits in that region to match the WFD depth and simulate images for all sensors from those in the main region and then for the rest of the visits we'd simulate images for only sensors in the DDF region.
The good news if that hypothesis is correct is that we ran extra sensors and could remove those from the DM processing and get a closer-to-square region with extra depth. For Run 2.1i this is a fractionally smaller waste of CPU time since there's a larger WFD region, but we should certainly check that there isn't a more serious issue - does the DDF region achieve the depth we would expect?

By the way, I have a suggestion for another way to plot depth - in addition to plotting median mag, we could plot median photometric uncertainty. That should hopefully reveal that regions like the borders are not truly deeper.

plaszczy commented 5 years ago

interesting indeed (which maybe reveals something on the upper left, see also avg(magu)). But I can't still get why the uDDF is not showing (much) more in the density plot (1st on this thread). Shouldn't going deeper resolve more galaxies? psfluxerr_u

plaszczy commented 5 years ago

this one is nice too (no it is not on your screen!) psfluxerr_i

wmwv commented 5 years ago

why does it [the DDF] look looks so large and circular ?

Yes, @egawiser 's description is correct. The FoV is a grid pattern with the corners missing. Rotating this around gives a circular region with a variable effective depth. This will likely be true in the real DDF data as well. We should not remove sensor in the processing; we should expect this behavior in the real data. If we for some reason want a specific region at a given density, we should add such filtering in our analysis pipelines.

wmwv commented 5 years ago

@wmwv it would then avoid confusion removing these borders from the final object catalog

We could consider adding an additional flag to define the "within-DC2-region" sample. What should we call this? Or should we repurpose the good flag for this and create a new flag for all good detections, even if outside the region? I'm a little stumped for naming. Some group brainstorming would probably help:

@plaszczy Can you create a new Issue, either under DC2-production or gcr-catalog to have a discussion for the naming of these? I'll follow up and add something to the catalog.

plaszczy commented 5 years ago

Yes, @egawiser 's description is correct. The FoV is a grid pattern with the corners missing. Rotating this around gives a circular region with a variable effective depth.

Looks like rotating the square (around the center) would give a smaller region.

wmwv commented 5 years ago

The DDF region is smaller than the field of view. The DDF region is 1.2 square deg. The FoV is 10 square deg.

wmwv commented 5 years ago

The full dithering (rotational and offset) plans for the DDF are an interesting and ongoing topic of discussion.

plaszczy commented 5 years ago

OK 10deg^2 over the 30 for the patch matches the plot. 2 points remaining

wmwv commented 5 years ago
  • why isn't the density higher in the uDDF?

I suspect it's because we run out of galaxies. Run 1.x only includes galaxies up to z~1.

  • what happens on the upper left part of psFluxErr_u (and accidents on psFluxErr_i)?

The upper-left corner in u band shows that we had fewer successful processed u-band visit in that region. I would suspect that several visit failed and so we don't have smooth coverage.

(and accidents on psFluxErr_i)?

I think these are bright stars. For faint things, psFluxErr is dominated by the sky background, but for bright things, psFluxErr is the Poisson noise of the brightness and so traces the brightness of objects. So if you take the average, you get hugely influenced by the bright stars. @egawiser had suggested plotting the median. Do you see the same features in plots of the median error?

plaszczy commented 5 years ago

I suspect it's because we run out of galaxies. Run 1.x only includes galaxies up to z~1.

anyone knows exactly which protoDC2 version was used?

(and accidents on psFluxErr_i)?

I think these are bright stars. For faint things, psFluxErr is dominated by the sky background, but for bright things, psFluxErr is the Poisson noise of the brightness and so traces the brightness of objects. So if you take the average, you get hugely influenced by the bright stars. @egawiser had suggested plotting the median. Do you see the same features in plots of the median error?

here is the plot for extendedness==0: psfluxerr_i_extend0

and for extendedness==1: psfluxerr_i_extend1 so that's look correct.

wmwv commented 5 years ago

I suspect it's because we run out of galaxies. Run 1.x only includes galaxies up to z~1.

anyone knows exactly which protoDC2 version was used?

protoDC2 v3.0 https://confluence.slac.stanford.edu/display/LSSTDESC/DC2+Data+Product+Overview

katrinheitmann commented 5 years ago

Here is the table that describes which extragalactic catalog was used for which image simulation run:

https://confluence.slac.stanford.edu/display/LSSTDESC/ProtoDC2+and+CosmoDC2+Information

On 2/19/19 10:13 AM, plaszczy wrote:

I suspect it's because we run out of galaxies. Run 1.x only includes galaxies up to z~1.

anyone knows exactly which protoDC2 version was used?

(and accidents on psFluxErr_i)?

I think these are bright stars. For faint things, psFluxErr is dominated by the sky background, but for bright things, psFluxErr is the Poisson noise of the brightness and so traces the brightness of objects. So if you take the average, you get hugely influenced by the bright stars. @egawiserhttps://github.com/egawiser had suggested plotting the median. Do you see the same features in plots of the median error?

here is the plot for extendedness==0: [psfluxerr_i_extend0]https://user-images.githubusercontent.com/11886566/53029510-52fc7d00-3469-11e9-8185-0f6491af3120.png

and for extendedness==1: [psfluxerr_i_extend1]https://user-images.githubusercontent.com/11886566/53029529-5bed4e80-3469-11e9-93b1-8e499048bf9b.png so that's look correct.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/LSSTDESC/DC2-production/issues/323#issuecomment-465198274, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMQ9jFnstGziyoT1cy2H7jpLo2cKlRsyks5vPCK-gaJpZM4a9stC.

plaszczy commented 5 years ago

so almost everything understood. I keep the thread opened for @johannct to check the "u" band visits.

johannct commented 5 years ago

hmmm this seems reminiscent of https://github.com/LSSTDESC/ImageProcessingPipelines/issues/65 which was supposed to be cured.

johannct commented 5 years ago

Quick look at the coaddDriver logs, and I do not see the "removing visit DataId...because scaled size scatter is too large" warning for the u-band, while I see some of it in other filters. This is in agreement with the fact that we run the u-band without PSF size matching. So LSSTDESC/ImageProcessingPipelines#65 does not seem to apply here. Any suggestion?

fjaviersanchez commented 5 years ago

@jchiang87 is there an easy way to run your number of visits check as a function of position on these u-band data? Maybe there are still some missing visits for some unknown reason on that corner?

johannct commented 5 years ago

The shallow mag_u region corresponds to tracts 5066, 4851, and to a lesser extent 5065. The corner is the first one, and the u-band coaddDriver log can be found (need a slac account) at http://srs.slac.stanford.edu/Pipeline-II/exp/LSST-DESC/log.jsp?pi=52464108 In terms of coadded exposures, I do not see anything obviously wrong, and the final count is for 5066 (grepping for "assembleCoadd: Coadding") is : 2,3 : 1 exposure 3,0 : 10 3,1 : 8 3,2 : 8 3,3 : 10 3,4 : 6 3,5 : 3 4,0 : 18 4,1 : 16 4,2 : 20 4,3 : 20 4,4 : 16 4,5 : 6 5,0 : 16 5,1 : 20 5,2 : 18 5,3 : 20 5,4 : 17 5,5 : 10 6,0 : 18 6,1 : 20 6,2 : 23 6,3 : 17 6,4 : 15 6,5 : 5 this looks reasonable given the layout of the patches of this tract compared to the DC2 WFD boundary, but it might be low compared to the diagonal corner (tract 4429) where several patches have more than 30 exposures

johannct commented 5 years ago

If I believe my visit to patch/tract DB, tract 5066 includes 342 calexps, while 4429 includes 770 calexp and is only slightly larger than 5066 in the WFD footprint (15 angainst 20 patches encompassed). Even more significantly, tract 4851 only includes 1230 calexp, while tract 4432 includes 1392 calexp and is significantly smaller (30 versus more than 42). Furthemore 4432 is diagonally opposed to DDF field, while 4851 is closer... is it possible that the opsim for 1.2 does not visit the upper left corner in u as much as the rest?

fjaviersanchez commented 5 years ago

Thanks a lot @johannct. Any thoughts @egawiser ?

katrinheitmann commented 4 years ago

Since we have moved on to Run 2.2 from Run 1.2p, I am closing this issue now. If you feel there will be more work on Run 1.2p in this direction, feel free to reopen. Thanks!