LSSTDESC / DC2-production

Configuration, production, validation specifications and tools for the DC2 Data Set.
BSD 3-Clause "New" or "Revised" License
11 stars 7 forks source link

Convert DC2 object catalogs into parquet format #342

Closed yymao closed 4 years ago

yymao commented 5 years ago

Newer DC2 DPDD catalogs (source, forced source, dia source) are now all in parquet format with consistent schema across files, and they all use readers that are subclasses of DC2DMCatalog.

On the other hand, the original DC2 DPDD object catalogs are still in hdf5 format, and hence for now we need to maintain two readers that are similar in nature. The reader of the DC2 DPDD object catalogs also has to fill in missing columns.

We should convert the DC2 DPDD object catalogs into the same parquet format as the newer DC2 DPDD catalogs.

yymao commented 5 years ago

Note: This change does not affect catalog users. User-facing API will remain the same. User experience may be improved due to the potential speed improvement during catalog loading.

yymao commented 4 years ago

@wmwv already has script for this, but see also https://github.com/LSSTDESC/gcr-catalogs/issues/351