BurntSushi / memchr

Optimized string search routines for Rust.
The Unlicense
858 stars 98 forks source link

Support for lt/gt conditions #157

Open purplesyringa opened 1 month ago

purplesyringa commented 1 month ago

serde_json has a hot loop in string parsing that searches for a ", a \, or a control character (ASCII 0x00-0x1f). I've seen a speedup from the user of memchr for " and \, and I expect a similar speedup from a SIMD-based control character search routine.

The strategies memchr utilizes for equality comparison are quite generic, and I think they can be extended to signed/unsigned less-than/greater-than comparison as-is. I admit this would complicate the crate, but that'd save people from having to reinvent the wheel over and over.

Would you be willing to explore the possibility of bringing it to this crate or accepting PRs that do so?

BurntSushi commented 1 month ago

I'm not so sure to be honest. Mostly for two reasons:

  1. I don't think I'll have the bandwidth to properly review such changes.
  2. I worry about the hits to compile time and binary size.

For (2), I think memchr is already teetering on the edge of what some might consider a "small" crate. It sticks pretty strictly to case sensitive substring search, but there is a lot of space to explore if its scope were increased to include inequality comparisons.

When I've needed inequality comparisons, I myself just wrote what I needed in lieu of adding it to this crate.

With that said, I'm not 100% against it. Because at least in terms of implementation, I do believe it would be very similar to what memchr already does, and memchr already has a fair bit of infrastructure in place to make supporting something like this across platforms. So if there exists a small & flexible API that doesn't add too much code to the crate, and can be done in a way that is opt-in, I might be open to it.