BurntSushi / aho-corasick

A fast implementation of Aho-Corasick in Rust.
The Unlicense
1.03k stars 93 forks source link

question: comparisson with a regex union. #98

Closed ritchie46 closed 1 year ago

ritchie46 commented 1 year ago

At which point would this crate be preferred over writing a regex union in the Regex crate. Would that be a certain number inputs or would this algorithm always be preferable?

BurntSushi commented 1 year ago

IMO, if you can use aho-corasick and you're otherwise not already using the regex crate, then you probably should use aho-corasick. Reasons:

  1. A lot less code to build and rely upon. aho-corasick is just matching literals. regex has a lot of code for handing the much more general case.
  2. aho-corasick is going to build its searcher much more quickly than regex. I hope to fix most of this in the not so distant future, but the regex crate is always going to have some kind of additional overhead. Today, it's quite a bit more than it needs to be.

Would that be a certain number inputs or would this algorithm always be preferable?

If you just have a regex like foo|bar|...|quux, then the regex crate will likely just use this crate.

But, you should always benchmark your specific use case. If you do have a case where regex is faster than aho-corasick, that would be very interesting and I should like to hear about it.

ritchie46 commented 1 year ago

If you do have a case where regex is faster than aho-corasick, that would be very interesting and I should like to hear about it.

Promised. Will do! Thanks for the explanation.