add branchless finalizers to sse2 memchr

BurntSushi / memchr

Optimized string search routines for Rust.

The Unlicense

799 stars 97 forks source link

add branchless finalizers to sse2 memchr #112

Closed squarewave closed 1 year ago

squarewave commented 2 years ago

This should be extended to other contexts if others are able to observe the same gains I was able to observe locally. See comment changes for an explanation of what's going on here, but basically we can avoid some looping if we eat an initial extra branch on whether our length is greater than our loop size. We can apply a similar optimization to the AVX2 case, and to memchr2 and friends.

BurntSushi commented 2 years ago

This is very clever. I think I buy it.

I think the thing to do now is to fix the build errors, apply it to the rest of the routines and run the benchmark suite.

squarewave commented 2 years ago

(Sorry for the radio silence on this. I'm intending to fix this patch and implement it for the rest of the routines, just haven't had the spare time yet.)

BurntSushi commented 1 year ago

Closing due to inactivity.

If someone wants to pick this back up, I think I'd be open to it. I'd like to possibly see these things split out into their own functions since they're pretty beefy. And ideally, I'd want to make sure our existing test coverage is good enough to push on these.