astronomy-commons / hipscat-import

HiPSCat import - generate HiPSCat-partitioned catalogs
https://hipscat-import.readthedocs.io
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Handle previous histogram binaries if import arguments change #261

Closed camposandro closed 4 months ago

camposandro commented 5 months ago

Bug report

In the process of importing a catalog, a user might stop the pipeline before it finishes but after the histogram binaries are generated and stored into disk. If they decide to change the import arguments and re-run the pipeline, it may fail.

For example, if we change the target HEALPix order, we will stumble upon the following error message:

File ~/.conda/envs/python_3.10/lib/python3.10/site-packages/hipscat/pixel_math/partition_stats.py:88, in generate_alignment(histogram, highest_order, lowest_order, threshold)
     60 """Generate alignment from high order pixels to those of equal or lower order
     61 
     62 We may initially find healpix pixels at order 10, but after aggregating up to the pixel
   (...)
     85         exceed threshold.
     86 """
     87 if len(histogram) != hp.order2npix(highest_order):
---> 88     raise ValueError("histogram is not the right size")
     89 if lowest_order > highest_order:
     90     raise ValueError("lowest_order should be less than highest_order")

ValueError: histogram is not the right size

Here the importer attempted to resume the pipeline from the same temporary directory and it encountered previously generated histogram binaries which were read to infer the histogram size. Because they differed, the error message is thrown. For more information please have a look at this workflow.

We should clean up these files if they are incompatible with the current import configuration or provide the user with a clear message on how to proceed manually.

Before submitting Please check the following: