Add z order traversal helpers

dsharlet / array

C++ multidimensional arrays in the spirit of the STL

Apache License 2.0

198 stars 15 forks source link

This PR adds a helper to do Z order traversals of multi-dimensional iterator ranges, including the results of split/split<>. This PR uses the new helpers to get a 20-25% speedup of matrix multiply. The inner loop of the matrix multiply using the z order traversal looks great in the profiler (memory stalls are much smaller).

This PR adds a BLAS benchmark (use BLAS=1 make ... to enable it). At first, I thought this z ordered multiply was beating OpenBLAS by ~10%. However, it seems that OpenBLAS was parallelizing by default, and that made it slower. If I force OpenBLAS to use one thread, it gets faster, and it's now beating my multiply by ~5% :(

dsharlet / array

Add z order traversal helpers #97