astronomy-commons / lsdb

Large Survey DataBase
https://lsdb.io
BSD 3-Clause "New" or "Revised" License
19 stars 5 forks source link

Explore dask expressions support using `ddf.partitions` and `map_partitions` instead of delayed #300

Open smcguire-cmu opened 6 months ago

smcguire-cmu commented 6 months ago

Instead of using dask delayed to align and map over the partitions of our catalogs, we could try to use the ddf.partitions accessor to align the partitions as necessary and map_partitions over them. There are still questions over how we deal with empty partitions and divisions, but may be an approach to look into.

nevencaplar commented 1 week ago

@smcguire-cmu Can you remind me, is this still active?

smcguire-cmu commented 1 week ago

So I looked into this a while back. I switched out our join and crossmatch implementations to use map_partitions instead of delayed and saw some improvement in lazy operation time, but not orders of magnitude improvements.

Advantages:

Disadvantages: