astronomy-commons / hipscat-import

HiPSCat import - generate HiPSCat-partitioned catalogs
https://hipscat-import.readthedocs.io
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Explicitly write to disk instead of `client.scatter` #302

Closed delucchi-cmu closed 4 months ago

delucchi-cmu commented 4 months ago

Bug report

Troy has been running into issues with slow worker initialization causing problems with the client.scatter call. Because the workers are not yet visible to the main runner, the client.scatter fails. This is a known issue with dask distributed (https://github.com/dask/distributed/issues/2941). We can explicitly write out the large side inputs to disk (or pickle them), and send along file references that can be decoded on the workers.

This is more of a problem with SLURM distributed clusters than typical local cluster configurations.

Before submitting Please check the following: