LSSTDESC / skyCatalogs

Create sky catalogs and provide access via API
BSD 3-Clause "New" or "Revised" License
6 stars 4 forks source link

Keep most config information (input to skyCatalogs API) partitioned into separate files by source #76

Open JoanneBogart opened 8 months ago

JoanneBogart commented 8 months ago

Catalogs for different source types are created independently. The config information the API needs for that source should ideally be created by the same program creating the data, or at least at the same time, but currently all config information is in the same file. I would like to keep data and possibly also the config information for each source type in subdirectories of a top-level directory which would have the top-level config. Such a config would look something like this: catalog_dir: top_dir
catalog_name: top_config
(more top-level keys) object_types: star: !include star/dc2_star.yaml snana: !include snana/dc2_sn.yaml galaxy: !include galaxy/cosmodc2_galaxy.yaml bulge: !include galaxy/cosmodc2_bulge.yaml disk: !include galaxy/cosmodc2_disk.yaml knots: !include galaxy/cosmodc2_knots.yaml

yaml does not natively support !include but there are extensions which do. I've tested pyyaml_include and it seems to be adequate.

dc2_star.yaml, one of the included files, could have contents subtype : cosmodc2_star star_truth: /global/cfs/cdirs/lsst/groups/SSim/DC2/dc2_stellar_healpixel.db MW_extinction: F19 area_partition: nside: 32 ordering: ring type: healpix data_file_type: parquet file_template: star/pointsource_(?P<healpix>\d+).parquet flux_file_template: star/pointsource_flux_(?P<healpix>\d+).parquet internal_extinction: None sed_file_root_env_var: SIMS_SED_LIBRARY_DIR sed_model: file_nm

JoanneBogart commented 8 months ago

The chief advantage of this scheme would be independence of source types, which in general are not created at the same time or by the same means. Subdirectories containing the data could be symlinks if convenient.

The subtype keyword would allow the API user to, e.g., refer to source type galaxy without being concerned whether the galaxies in question are cosmodc2 galaxies or diffsky galaxies.

JoanneBogart commented 4 days ago

Upon further thought and a start at implementation, I propose some changes to the scheme outlined above:

catalog_dir:  top_dir              # this is often just .
catalog_name: skyCatalog
(more top-level keys)
object_types:
   star:  !include star.yaml
   snana: !include snana.yaml
   diffsky_galaxy: !include diffsky_galaxy.yaml