dsharlet / array

C++ multidimensional arrays in the spirit of the STL
Apache License 2.0
198 stars 15 forks source link

Add z order traversal helpers #97

Closed dsharlet closed 10 months ago

dsharlet commented 10 months ago

This PR adds a helper to do Z order traversals of multi-dimensional iterator ranges, including the results of split/split<>. This PR uses the new helpers to get a 20-25% speedup of matrix multiply. The inner loop of the matrix multiply using the z order traversal looks great in the profiler (memory stalls are much smaller).

This PR adds a BLAS benchmark (use BLAS=1 make ... to enable it). At first, I thought this z ordered multiply was beating OpenBLAS by ~10%. However, it seems that OpenBLAS was parallelizing by default, and that made it slower. If I force OpenBLAS to use one thread, it gets faster, and it's now beating my multiply by ~5% :(

dsharlet commented 10 months ago

Just superficial comments. I need more cycles to remember how z order traversal works in ND.

I think the easiest way to understand it is that it's a depth first traversal of an N-dimensional generalization of a quadtree/octree. A lot of things (e.g. the wikipedia article) talk about it as a bit manipulation of the indices, I think that approach is a lot harder to implement, especially if you want to efficiently prune the parts of the tree that are out of bounds.