astronomy-commons / hipscat

Hierarchical Progressive Survey Catalog
https://hipscat.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
17 stars 3 forks source link

Inconsistent metadata schema in association catalogs #314

Open camposandro opened 2 months ago

camposandro commented 2 months ago

Bug report

The generate_data.ipynb notebook generates an association catalog with the following schema:

pa.schema(
        [
            pa.field("Norder", pa.int64()),
            pa.field("Npix", pa.int64()),
            pa.field("join_Norder", pa.int64()),
            pa.field("join_Npix", pa.int64())
        ]
)

It seems like the column data types are all being inferred to int64, when *_Norder should be of type uint8 and *_Npix of type uint64. Specifying the schema when creating the record batches from arrays should solve this issue: https://github.com/astronomy-commons/hipscat/blob/047600e667191af000337b446f7c24fd37a6b0eb/src/hipscat/catalog/association_catalog/partition_join_info.py#L93-L101

Before submitting Please check the following: