astronomy-commons / hipscat-import

HiPSCat import - generate HiPSCat-partitioned catalogs
https://hipscat-import.readthedocs.io
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

SOAP catalog info column name parameters don't match leaf files #259

Open smcguire-cmu opened 5 months ago

smcguire-cmu commented 5 months ago

Bug report

When generating SOAP for the LSDB small sky unit test data, the output leaf files of the association table were written with each row matching an object_id from the object table to a source_id in the source table. But the catalog info join_column specifying the column to use in the source table when performing the join was set to object_id instead of source_id.

The SOAPArguments given were:

args = SoapArguments(
    object_catalog_dir="small_sky",
    object_id_column="id",
    source_catalog_dir="small_sky_order1_source",
    source_object_id_column="object_id",
    source_id_column="source_id",
    output_path=".",
    output_artifact_name="small_sky_to_o1source",
    write_leaf_files=True,
    overwrite=True,
)

which generated the catalog info:

{
    "catalog_name": "small_sky_to_o1source",
    "catalog_type": "association",
    "total_rows": 17161,
    "primary_catalog": "small_sky",
    "primary_column": "id",
    "primary_column_association": "object_id",
    "join_catalog": "small_sky_order1_source",
    "join_column": "object_id",
    "join_column_association": "source_id",
    "contains_leaf_files": true
}