cubed-dev / cubed

Bounded-memory serverless distributed N-dimensional array processing
https://cubed-dev.github.io/cubed/
Apache License 2.0
121 stars 14 forks source link

Optimization tracking issue #339

Open tomwhite opened 10 months ago

tomwhite commented 10 months ago

This is an umbrella issue for tracking the work on optimizations in Cubed.

Creation optimizations

Making creation operations more efficient, typically by not materializing unnecessary data.

Fusion optimizations

Currenly we fuse map blocks operations with one input, but there are more types of fusion we could implement.

Reduction optimizations

Reduction operations like sum and mean could be improved by minimising the amount of data transferred.

High-level query optimizations

Re-writing array expressions to an optimized form (before applying the optimizations above).

Benchmarking and runtime

Testing the effect of the optimizations above.

Documentation

TomNicholas commented 9 months ago

Do any of the caveats on the Scaling docs page need updating after all these improvements? e.g. it currently says

In theory multiple blockwise operations can be fused together, enhancing the performance further. However this has not yet been implemented in Cubed.

tomwhite commented 9 months ago

Do any of the caveats on the Scaling docs page need updating after all these improvements? e.g. it currently says

In theory multiple blockwise operations can be fused together, enhancing the performance further. However this has not yet been implemented in Cubed.

Yes, thanks for raising this. I've opened #381 to track documentation changes.