LSSTDESC / DC2-production

Configuration, production, validation specifications and tools for the DC2 Data Set.
BSD 3-Clause "New" or "Revised" License
11 stars 7 forks source link

Check Run3.0i instance catalog generation and imSim runtimes #393

Closed jchiang87 closed 4 years ago

jchiang87 commented 4 years ago

Using the code in Bryce's PR of SLSpinkler and in this PR of sims_GCRCatSimInterface, I generated instance catalog files for i-band visit 790692. These files include objects for the 4 components that we plan to simulate for Run3.0i on top of the Run2.2i checkpoint data:

Here's a figure showing the locations of the objects in those instance catalog files: Run3 0i_instcat_objects_v709692 The DDF region is indicated by the dashed yellow boundary.

To check the instance catalogs and imSim outputs, I ran image simulations for each of these four components separately, without using the Run2.2i checkpoints and including stars so that the astrometry would solve in running processCcd.py. I computed the true fluxes for all of the objects in each of the 4 simulations and compared those true fluxes to the measured fluxes produced by processCcd.py. Here's a plot of meas_flux/true_flux vs true_flux for those 4 components and for the stars: Run3 0i_instcat_flux_checks_v709692-i

Notes:

jchiang87 commented 4 years ago

To investigate the imsim rendering times, I used a different i-band visit, v709680, and ran in the interactive queue on a Cori-Haswell node. Here's a plot of the wall time to render each object versus the number of photons per object: Run3 0i_object_rendering_times Even though there are many more pointSources than FitsImages, the bulk of the wall time is spent rendering the FITS stamps. Here's a table of the wall times in seconds spent on each object type for the different sensors:

sensor pointSource FitsImage
R20_S22 15.4 234.9
R21_S01 9.9 0.0
R21_S02 20.5 341.6
R21_S10 1.8 76.6
R21_S11 18.7 915.2
R21_S12 17.7 763.8
R21_S20 21.3 888.0
R21_S21 20.4 788.2
R21_S22 19.9 904.4
R22_S00 7.0 356.1
R22_S10 20.9 885.0
R22_S11 9.6 47.6
R22_S20 16.8 610.8
R22_S21 17.6 636.1
R22_S22 9.8 0.0
R30_S02 8.8 307.7
R31_S00 20.3 962.1
R31_S01 21.7 1000.2
R31_S02 19.9 861.2
R31_S10 15.4 562.8
R31_S11 31.4 890.0
R31_S12 18.2 767.5
R31_S20 1.7 74.0
R31_S21 20.4 793.4
R31_S22 23.4 1059.3
R32_S00 24.0 892.3
R32_S01 22.6 769.9
R32_S02 9.2 436.4
R32_S10 21.8 864.6
R32_S11 21.2 885.2
R32_S12 6.1 265.4
R32_S20 32.3 600.6
R32_S21 1.9 85.5
R41_S01 6.2 345.0
R41_S02 10.5 484.9
averages 16.1 581.6
jchiang87 commented 4 years ago

The FITS images of the strongly lensed galaxies are 1000x1000 pixels and have a pixel scale of 0.01 arcsec. Given the CCD pixel scale of 0.2 arcsec and seeing > 0.7 arcsec, plus that fact that galsim will interpolate the FITS images, that level of resolution for the FITS stamps is probably unnecessary. For very large numbers of pixels, the rendering times will scale roughly as ~# pixels, so we can get substantial speedups by rebinning, at least until other parts of the calculation start to dominate.

The step in the rendering times for the FitsImage objects arises from a threshold in the code to do a checkpoint after any FitsImage objects that have realized fluxes > 1e4 photons. This is to avoid losing progress on those images for the longer runtimes on KNL. Since we have so many bright strongly lensed galaxies, that 1e4 photon threshold is the limiting factor on how much we can gain from rebinning. Since the rendering times are <10s wall time on Haswell, we can increase that threshold to much larger value in order to ascertain where the gains from rebinning start to plateau.

Here's a plot of the rendering times per object vs photon flux with that checkpointing threshold set to 1e6 photons: Run3 0i_object_rendering_times_1x1_thresh_1e6

And here is a table of mean rendering times per CCD for the FitsImage components as a function of different levels of rebinning:

rebinning avg. time/CCD (s)
1x1 420
2x2 114
4x4 49
5x5 49
8x8 41
10x10 38

4x4 rebinning yields a speedup of ~8.4 and still has a pixel scale of 0.04 arcsec, which seems like a reasonable compromise between resolution and speed.

For the record, here are the rendering times per object for 4x4 rebinning: Run3 0i_object_rendering_times_4x4_thresh_1e6

jchiang87 commented 4 years ago

Here are plots of the Run3.0i instance catalog objects sky positions and of the ratio of measured flux to true flux vs true flux for visit 709680 using the most recent code in the issue/6/dc2_specific_sprinkler branch of the SLSprinkler package: v709680-i_instcat Run3 0i_instcat_flux_checks_v709680-i The truth and instance catalog data that went in to these plots should essentially be the production versions.

egawiser commented 4 years ago

Is it expected that the measured flux will overestimate the true flux for true fluxes below 2000 nJy? That corresponds to magnitude 23.2, and unless this is a z-band or y-band visit I would expect a detected catalog to be complete (and nearly unbiased) to deeper than that for a single visit.

jchiang87 commented 4 years ago

The source catalogs produced by the Stack at the visit level only include measured fluxes with SNR > 5, which for this visit is indeed ~23.2. So that diagonal lower bound at the faint end is just that cutoff. Here's a plot of the same data for the stars, but with meas_flux/true_flux vs log10(meas_flux/nJy):

v709680-i_flux_ratio_vs_meas_flux

egawiser commented 4 years ago

Nice - that makes sense. But is this an i-band visit, as I inferred from the plot legend? And if so, why is the 5 sigma depth so low (could be clouds or terrible seeing); I guess the test would be whether this is a typical depth for i-band visits. And the question might be outside the bounds of what you're interested in worrying about in this issue.

jchiang87 commented 4 years ago

For this visit, the sky level is on the high side: ~3300 photons/pixel. The seeing is ~0.9 arcsec.

For this plot, I was mainly interested in verifying that the envelopes of flux ratios for the individual sprinkled components (lensed agn, lensed sne,...) were roughly consistent with the distribution for the stars, which I'm just using as a reference.