LSSTDESC / imSim

GalSim based Rubin Observatory image simulation package
https://lsstdesc.org/imSim
BSD 3-Clause "New" or "Revised" License
36 stars 15 forks source link

instance catalog read twice #464

Closed esheldon closed 2 months ago

esheldon commented 4 months ago

I'm seeing that instance catalogs are read twice.

When galaxies are present its taking about 20minutes to read the catalog, so doubling this is significant.

Below I added a timer to show the read time. You can see that it happens twice

Reading config file gauss-nodcr-01.yaml
Overriding default configuration file with /gpfs/mnt/gpfs02/astro/desdata/esheldon/mambaforge/envs/imsim/share/eups/Linux64/dustmaps_cachedata/gbb0a0c949e+81bc2a20b4/config/.dustmapsrc
Reading visit info from instance catalog 01032304/instcat.txt
Done reading meta information from instance catalog
Reading instance catalog 01032304/instcat.txt
49884838it [22:37, 36749.65it/s]
Total objects in file = 49884816
Found 199874 objects potentially on image
Reading instance catalog 01032304/instcat.txt
49884838it [22:25, 36947.85it/s]
...snip...
rmjarvis commented 4 months ago

I think the intended use of instance catalogs was to have only the objects that will appear on the image (or not too far off of the image) in each file. So reading that twice isn't so much overhead. I don't remember off hand the design constraints that led us to have it read twice, but it was probably just that the way we have been using the instance catalogs in the past, it wasn't much of a tall pole. I'm sure it's possible to have it only read once, but it's not a trivial change, I don't think.

esheldon commented 4 months ago

I'm using the instcats that were used in real production I believe. These cover the full focal plane

esheldon commented 4 months ago

Could we put an @lru_cache on the function?

rmjarvis commented 4 months ago

We don't use instcats anymore in production. We use SkyCatalogs.

esheldon commented 4 months ago

Yes, I believe these were used in an older run.

I had to use instcats for this because what I was trying to do was not possible with sky catalogs (according to @jchiang87 )

jchiang87 commented 4 months ago

I think the initial read is simply to get the number of objects for galsim to set seeds etc. for multiprocessing. We got around this problem with skyCatalogs by adding an approx_nobjects option to the input.sky_catalog yaml section: https://github.com/LSSTDESC/imSim/blob/main/imsim/skycat.py#L67 We could add the same option for the instance catalog code to avoid reading in the entire file.

rmjarvis commented 4 months ago

Ah, that makes sense. Yeah, we could add that for instcat too. That would be pretty straightforward.

cwwalter commented 2 months ago

This was fixed with #465 and #466.