cubed-dev / cubed

Bounded-memory serverless distributed N-dimensional array processing
https://cubed-dev.github.io/cubed/
Apache License 2.0
121 stars 14 forks source link

Switch `concatenate2` to `concatenate3` #529

Open dcherian opened 3 months ago

dcherian commented 3 months ago

I noticed you're using concatenate2 from dask. I've found this to be quite wasteful, it's cool in its recursive ability but sucks in that it repeatedly allocates new memory.

Cubed should be using concatenate3 which allocates the final output array up top, and then assigns regions of that output, similar to np.block.

With dask at least, I've noticed that _concatenate2 takes half the time it takes to execute a reduction haha.

PS: it feels like the long-term impact of dask is to give a small number of us a common language for these internal utilities. I see your lol_product!

TomNicholas commented 3 months ago

it feels like the long-term impact of dask is to give a small number of us a common language for these internal utilities

💯 💯 💯