We are getting an initial nested-column support with Hats/LSDB ecosystem now. Now we have a couple of catalogs (ZTF alerts, SDSS DR7 spectra) with nested lists that represent nested data we could pack to a single nested column after we read the data.
Today we can nest these list-columns with code like this one:
from lsdb import read_hats
raw_catalog = read_hats('https://data.lsdb.io/hats/alerce/')
catalog_with_lc = raw_catalog.nest_lists(
base_columns=[col for col in raw_catalog.columns if not col.startswith('lc_')],
name='lc',
)
catalog_with_nondet = catalog_with_lc.nest_lists(
base_columns=[col for col in catalog_with_lc.columns if not col.startswith('nondet_')],
name='nondet',
)
catalog = catalog_with_nondet.nest_lists(
base_columns=[col for col in catalog_with_nondet.columns if not col.startswith('ref_')],
name='ref',
)
This works, but it is not a perfect user experience: how would user know which columns can be packed (here it is with name prefixes, but it is not scalable and ugly), how does user save a catalog to the initial format when calling to_hats?
We can solve these issues with a better nested columns support across the ecosystem:
[ ] hats: Parse metadata to hats catalog which specifies which list-columns correspond to which nested columns, e.g. mag and mjd form lc, while flux and wave form sed.
[ ] hats-import: Generate and save nested column metadata
[ ] lsdb: read_hats uses nested column metadata to pack list-columns into NestedDtyped columns. It still allows to select individual "nested" columns, e.g. if "mag" and "magerr" are selected, and "mjd" is not, the first two form an "lc" nested column.
[ ] lsdb: to_hats splits nested column to list-columns and creates appropriate metadata
We are getting an initial nested-column support with Hats/LSDB ecosystem now. Now we have a couple of catalogs (ZTF alerts, SDSS DR7 spectra) with nested lists that represent nested data we could pack to a single nested column after we read the data.
Today we can nest these list-columns with code like this one:
This works, but it is not a perfect user experience: how would user know which columns can be packed (here it is with name prefixes, but it is not scalable and ugly), how does user save a catalog to the initial format when calling
to_hats
?We can solve these issues with a better nested columns support across the ecosystem:
hats
: Parse metadata to hats catalog which specifies which list-columns correspond to which nested columns, e.g.mag
andmjd
formlc
, whileflux
andwave
formsed
.hats-import
: Generate and save nested column metadatalsdb
:read_hats
uses nested column metadata to pack list-columns intoNestedDtype
d columns. It still allows to select individual "nested" columns, e.g. if "mag" and "magerr" are selected, and "mjd" is not, the first two form an "lc" nested column.lsdb
:to_hats
splits nested column to list-columns and creates appropriate metadatanested-pandas
: https://github.com/lincc-frameworks/nested-pandas/issues/163