Closed Mart-Bogdan closed 1 year ago
Yes, the benchmarks are internal. Publishing real benchmarks The Right Way is a ton of work. I'm about to do it for regex engines and it has taken me literal months of work. I have no plans of doing that for this crate.
Basically when it is beneficial to use this lib, at what number of sub-strings?
Someone else can't answer this for you. You should define a benchmark that models your work load. Then you should run them and make a decision based on that.
I am interested to know how it compares to rust's String::contains, String::replace, if we are using aho-corasick with single needle (I think would be slower), two needles, N needles, etc.
In a default configuration, this crate will use memmem
from the memchr
crate if you only hand it a single pattern. That routine will blow the pants off of std
's substring search. So that means that, even for a single pattern, you can use this crate just fine and it should be faster than what you can do with std
.
This crate is used by the regex
crate for literal optimizations. It tries to do the best it can with what you give it.
If you give it a "small" number of patterns and you're running on x86_64
, then it will automatically try to use a vectorized multi-substring matcher that doesn't use the Aho-Corasick algorithm at all. It can be an order of magnitude faster than Aho-Corasick.
But I have to stress: the only way to answer your question is to define a benchmark that is meaningful to your use case.
In a default configuration, this crate will use
memmem
from thememchr
crate if you only hand it a single pattern. That routine will blow the pants off ofstd
's substring search. So that means that, even for a single pattern, you can use this crate just fine and it should be faster than what you can do withstd
.
Oh, wow. I thought STD version is highly SIMD optimized.
Am I understand correctly -- your library, in case if single pattern is povided is suing different implementation, than if we have many items, so instead of automation generation it is using memmem
?
If you give it a "small" number of patterns and you're running on x86_64, then it will automatically try to use a vectorized multi-substring matcher that doesn't use the Aho-Corasick algorithm at all.
Oh I see. That's really cool. Thank you.
I'll try doing some benchmarks myself.
Oh, wow. I thought STD version is highly SIMD optimized.
Yeah, it's not. We would probably like it to be, but substring search is in core
and at least right now, we don't have a great way of using ISA extensions in core
. I'm not sure anyone has any great ideas on how to make it work either. It's pretty unfortunate.
Am I understand correctly -- your library, in case if single pattern is povided is suing different implementation, than if we have many items, so instead of automation generation it is using
memmem
?
For a single pattern, yes. You should be able to confirm it by looking at a profile. If it isn't happening then it's probably a bug and I'd happily look into it if you can provide a reproduction.
And note that as I said above, even if you provide multiple patterns, an Aho-Corasick automaton still might not be used at all. It just depends. A lot of it is heuristics at this point.
(Note that an actual automaton will always be generated IIRC. It's just not always used for searches.)
I'll try doing some benchmarks myself.
Aye. Good. Let me know how it goes!
By the way in STD contains
has SIMD optimisation, but only on level of sse2
I thought that ones to search index of occurrence has as well, but turns out that no, it doesn't. https://github.com/rust-lang/rust/blob/84dd17b56a931a631a23dfd5ef2018fd3ef49108/library/core/src/str/pattern.rs#L1062
Ah yeah, forgive the senility. That ends up working because SSE2 isn't an ISA extension, it's part of x86_64
.
Hmmmm, that's right. On each 64bit intel systems SSE2 would be present.
Strange that it checks both conditions: x86_64 AND SSE2 feature.
Hmmmm, that's right. On each 64bit intel systems SSE2 would be present.
Strange that it checks both conditions: x86_64 AND SSE2 feature.
It might be because, technically, the OS can choose not to support SSE2 by not supporting calling conventions that use xmm
registers. But that's just a guess. I don't know for sure.
Benchmarks are hard :D
I've made some draft benchmarks, but results are confusing.
But preliminary results are:
str::contains
is faster on small haystack
on x86
str::matches
is slower, good to use aho-corasick even for single needle
str::replace
is also slower even for single needle
.
I think I'll try creating more organized results, more likely closer to end of a week.
P.S. criterion is weird, as it shows results with diferent scale, mixing ns and micro-s in same output. That caused me to belive that creating new instance of AchoCrasick inside loop is 2 times faster, but it was actually 500 times slower
Why not share your code and inputs so that others can run it?
And yes, there is often a latency versus throughput trade off. See: https://docs.rs/memchr/latest/memchr/#why-use-this-crate
An expectation that one implementation is always faster in every case is unrealistic. This is why I said you should vernal your specific use case. Then you can make a judgement call about whether the differences in, say, latency merit a more complicated implementation on your end. Or whether just using aho-corasick is good enough.
Why not share your code and inputs so that others can run it?
it's really dirty now :D
It would be nice to mention benchmark results in reamed.
I've ran all benches, but looking on them didn't give me any understanding how fast this library is.
I look at results, like:
And it blows my mind🤯.
I've looked at becnhmarks, looks like it all about measuring this library, perhaps to compare with previous build.
All this benches are internal?
I am interested to know how it compares to rust's String::contains, String::replace, if we are using aho-corasick with single needle (I think would be slower), two needles, N needles, etc.
Perhaps it's two different issues:
Basically when it is beneficial to use this lib, at what number of sub-strings?