Optimization tracking issue

tomwhite commented 10 months ago

This is an umbrella issue for tracking the work on optimizations in Cubed.

Creation optimizations

Making creation operations more efficient, typically by not materializing unnecessary data.

[x] #336
[x] #343
[x] #359

Fusion optimizations

Currenly we fuse map blocks operations with one input, but there are more types of fusion we could implement.

[x] #337
- This should be done early on, so changes to fuse or other DAG manipulation tasks don't need to be done twice
[ ] #288
[x] #342
- For example, allow users to switch on more aggressive optimizations
- [x] #376
[x] #136
- This change will fuse operations with multiple inputs
[x] #366
[x] #69
[ ] Sibling fusion
- This change will fuse operations that share the same inputs

Reduction optimizations

Reduction operations like sum and mean could be improved by minimising the amount of data transferred.

[x] #350
[x] #365
[x] #331
[x] #284
[ ] #418

High-level query optimizations

Re-writing array expressions to an optimized form (before applying the optimizations above).

[ ] #333

Benchmarking and runtime

Testing the effect of the optimizations above.

[x] #356
[ ] #357

Documentation

[x] #381

TomNicholas commented 9 months ago

Do any of the caveats on the Scaling docs page need updating after all these improvements? e.g. it currently says

In theory multiple blockwise operations can be fused together, enhancing the performance further. However this has not yet been implemented in Cubed.

tomwhite commented 9 months ago

Do any of the caveats on the Scaling docs page need updating after all these improvements? e.g. it currently says

In theory multiple blockwise operations can be fused together, enhancing the performance further. However this has not yet been implemented in Cubed.

Yes, thanks for raising this. I've opened #381 to track documentation changes.

cubed-dev / cubed