astronomy-commons / hipscat-import

HiPSCat import - generate HiPSCat-partitioned catalogs
https://hipscat-import.readthedocs.io
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

SOAP leaf file column names and catalog_info #214

Closed smcguire-cmu closed 6 months ago

smcguire-cmu commented 7 months ago

So I was creating some unit test data with SOAP, and I'm not sure where the column names for the id columns come from when writing leaf files.

My SOAP arguments were

args = SoapArguments(
    object_catalog_dir="data/small_sky",
    object_id_column="id",
    source_catalog_dir="data/small_sky_order1_source",
    source_object_id_column="obj_id",
    source_id_column="id",
    write_leaf_files=True,
    output_path="data/small_sky_to_o1source",
    output_artifact_name="small_sky_to_o1source",
    overwrite=True,
)

It generated leaf files with id as the column for the object id and index as the column for the source obj_id. I'm not sure where index came from.

Also, the generated catalog info does not set the primary_column_association and join_column_association parameters, which specify the column names in the association leaf files. These were added to hipscat fairly recently, and I think SOAP wasn't updated to set these.