byteverse / contiguous

Typeclass for array types
Other
19 stars 9 forks source link

Add some functions #21

Closed chessai closed 5 years ago

chessai commented 5 years ago

don't merge this. though i would appreciate review as changes are made.

andrewthad commented 5 years ago

I've created a repository to track performance of scanning arrays forward and backward. It turns out that scanning forward wins in the microbenchmark I came up with. Additionally, the generated assembly has the same sections, each the same the size, as the generated assembly of the implementation that scans backward. I am led to believe that modern Intel processors privilege forward scans of arrays by just a little bit. The two top performers:

benchmarked HighToLowGt                                                                                                                                                                       
time                 45.63 μs   (45.24 μs .. 46.02 μs)                                                                                                                                        
                     0.999 R²   (0.998 R² .. 1.000 R²)                                                                                                                                        
mean                 45.56 μs   (45.39 μs .. 45.80 μs)          
std dev              727.2 ns   (539.4 ns .. 975.6 ns)                         

benchmarked LowToHigh                                             
time                 42.96 μs   (42.10 μs .. 44.20 μs)                                                                                                                                                             0.998 R²   (0.996 R² .. 1.000 R²)
mean                 42.72 μs   (42.57 μs .. 43.03 μs)                                                                                                                   
std dev              734.6 ns   (459.8 ns .. 1.150 μs)                                                                                                                                                                                              

These numbers come from GHC head (8.9).

chessai commented 5 years ago

I suspected this might be the case. My intuition about things like this is that modern processors do a ton of optimisations that are aware of how programmers usually write code, although if any of these optimisations made didnt exist i think the natural idea where comparing to a literal repeatedly would do better.

chessai commented 5 years ago

@andrewthad can you review? many changes, useful functions, and beginnings of a test suite.

chessai commented 5 years ago

ping

chessai commented 5 years ago

all concerns addressed

andrewthad commented 5 years ago

Squashed and merged

chessai commented 5 years ago

okay, adding more tests