astronomy-commons / hipscat-import

HiPSCat import - generate HiPSCat-partitioned catalogs
https://hipscat-import.readthedocs.io
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Importing data without objectID or object index #274

Open nevencaplar opened 4 months ago

nevencaplar commented 4 months ago

Bug report

When importing data into Parquet from csv via pandas, and then importing, the code fails at this line

https://github.com/astronomy-commons/hipscat-import/blob/62a30a0768e5035c02df19d26037c369d007618f/src/hipscat_import/catalog/map_reduce.py#L152C1-L153C1

This seems to be because the Pandas index is duplicated (each file has the same index values that Pandas assigned to the data). When joining data from different files the code then fails. The solution was to import data to parquet files with index=False. I have to go back and give more information about how this was precisely done.

However, the code should not fail when the data has duplicate indexes that have been created by pandas.

Before submitting Please check the following: