Closed chessai closed 5 years ago
I've created a repository to track performance of scanning arrays forward and backward. It turns out that scanning forward wins in the microbenchmark I came up with. Additionally, the generated assembly has the same sections, each the same the size, as the generated assembly of the implementation that scans backward. I am led to believe that modern Intel processors privilege forward scans of arrays by just a little bit. The two top performers:
benchmarked HighToLowGt
time 45.63 μs (45.24 μs .. 46.02 μs)
0.999 R² (0.998 R² .. 1.000 R²)
mean 45.56 μs (45.39 μs .. 45.80 μs)
std dev 727.2 ns (539.4 ns .. 975.6 ns)
benchmarked LowToHigh
time 42.96 μs (42.10 μs .. 44.20 μs) 0.998 R² (0.996 R² .. 1.000 R²)
mean 42.72 μs (42.57 μs .. 43.03 μs)
std dev 734.6 ns (459.8 ns .. 1.150 μs)
These numbers come from GHC head (8.9).
I suspected this might be the case. My intuition about things like this is that modern processors do a ton of optimisations that are aware of how programmers usually write code, although if any of these optimisations made didnt exist i think the natural idea where comparing to a literal repeatedly would do better.
@andrewthad can you review? many changes, useful functions, and beginnings of a test suite.
ping
all concerns addressed
Squashed and merged
okay, adding more tests
don't merge this. though i would appreciate review as changes are made.