GalSim-developers / GalSim

The modular galaxy image simulation toolkit. Documentation:
http://galsim-developers.github.io/GalSim/
Other
224 stars 105 forks source link

Support filtering FITS in GalSim Catalogs #1207

Closed sidneymau closed 1 year ago

sidneymau commented 1 year ago

It would be useful to apply filters to catalogs using the native FITS interface so that only a subset of the table needs to be read into memory (especially in the case of working with large catalogs).

A proposed change to Catalog would be to modify the readFits method to

def readFits(self):
    import fitsio
    with fitsio.FITS(self.file_name) as fits:
        if self.query is not None:
            w = fits[1].where(self.query)
            self._data = fits[1][w].copy()
        else:
            self._data = fits[1].copy()
    self.names = self._data.dtype.names
    self.nobjects = len(self._data)
    self._ncols = len(self.names)
    self.isfits = True

Note that I'm using fitsio here as I wasn't sure about fits filtering in the astropy fits library.

In this example, I've added query as an optional param that can be specified in a config file like so:

catalog:
  file_name: galactic_seds.fits
  query: "mag_g_lsst - mag_i_lsst > 2.9 && mag_g_lsst - mag_i_lsst < 3.0"
beckermr commented 1 year ago

The other possible item here would be to extend the reader to parquet and possibly link that to parquet's ability to support predicate pushdown through metadata. Not something you need to do @sidneymau, but it is a related idea that will be more relevant in the LSST era. A unified interface to push down predicates like these would be nice.

rmjarvis commented 1 year ago

galsim_extra is a better home for this. Closing this issue here.