For this, LowerRegex first creates a NFA from the Regex, then converts this to an minimized DFA.
Finally, this DFA is emitted as Thorin code again (see dfa2matcher).
This allows us to match non-deterministic regex such as (a|b)+a correctly as well, which was a pretty strong downside of the old impl.
Note, we still don't support capture groups and the like, and also, we lose support for not although that might be reimplementable..
Also, we lost the showcase of using the annexes as implementation.
For this, LowerRegex first creates a NFA from the Regex, then converts this to an minimized DFA. Finally, this DFA is emitted as Thorin code again (see dfa2matcher).
This allows us to match non-deterministic regex such as
(a|b)+a
correctly as well, which was a pretty strong downside of the old impl. Note, we still don't support capture groups and the like, and also, we lose support fornot
although that might be reimplementable.. Also, we lost the showcase of using the annexes as implementation.The regex->min dfa infra follows the algorithms from https://doi.org/10.1007/978-3-642-17540-4 and hopcroft from https://en.wikipedia.org/wiki/DFA_minimization.
A super quick&dirty benchmark seems favorable :) (note how the old impl was slower than manual and the new one is faster.. :))