Cubed currently always implements the shuffle operation as an all-to-all rechunking using the algorithm from rechunker. This creates an intermediate persistent Zarr store, and requires all chunks to be written then all chunks to be read. Can we do better?
We could consider using a different storage service, such as a different Zarr reader, Google Tensorstore (see #187), or maybe redis. These are all still fundamentally the same shuffle operation though.
Another idea is to narrow the number of situations in which we actually need a full rechunk. There are some trivial cases (see https://github.com/tomwhite/cubed/pull/256), but there might be others. @dcherian had an idea for representing rechunk operations as blockwise somehow that I would like to hear more about!
Pedro Lopez pointed us towards the Primula paper, saying it implements an efficient serverless shuffle (for a big sorting operation). I'm not sure I understand it well enough yet, but my impression is that it's actually basically the same save-everything-to-intermediate-blob-storage idea that we're already using, plus some more minor optimizations.
Cubed currently always implements the shuffle operation as an all-to-all rechunking using the algorithm from rechunker. This creates an intermediate persistent Zarr store, and requires all chunks to be written then all chunks to be read. Can we do better?
We could consider using a different storage service, such as a different Zarr reader, Google Tensorstore (see #187), or maybe redis. These are all still fundamentally the same shuffle operation though.
Another idea is to narrow the number of situations in which we actually need a full rechunk. There are some trivial cases (see https://github.com/tomwhite/cubed/pull/256), but there might be others. @dcherian had an idea for representing rechunk operations as blockwise somehow that I would like to hear more about!
Pedro Lopez pointed us towards the Primula paper, saying it implements an efficient serverless shuffle (for a big sorting operation). I'm not sure I understand it well enough yet, but my impression is that it's actually basically the same save-everything-to-intermediate-blob-storage idea that we're already using, plus some more minor optimizations.
EDIT: Correct link to Primula paper