Correct way to benchmark faster

jackmott commented 6 years ago

I am doing some benchmarking of my own simd lib against faster and want to be sure I'm doing it correctly. I'm using criterion, replicating the "lots of 3s" example, as shown in this gist:

https://gist.github.com/jackmott/a0b8ca811d2cf2ecb97a35f0aee0a5c6

I'm using the default compilation settings which should be targeting SSE2 instructions for Faster, and I'm using the SSE2 settings in my library. Does this look like a fair comparison? Am I missing anything?

Also how is ceil implemented for SSE2? I think it is slower than it needs to be but I can't figure out where it happens in the faster source.

AdamNiederer commented 6 years ago

That looks about right. I think rounding isn't available on SSE2, though, so faster will probably be pretty slow on that benchmark. You're also using the chunk of the API which doesn't make any alignment/length assumptions, so that may introduce some additional overhead as well.

jackmott commented 6 years ago

Once I crank it up to AVX2 it's on par with mine exactly, its just the ceil instruction slowing it down with sse2, so great work! Its neat to see all that iterator magic compile away to nothing.

jackmott commented 6 years ago

closing because looks like I have it sorted, thanks!

AdamNiederer / faster

Correct way to benchmark faster #43