Panzerschrek / RegPanzer

library and compiler for regular expressions
BSD 3-Clause "New" or "Revised" License
5 stars 1 forks source link

Try to implement NFA-matching #2

Open Panzerschrek opened 8 months ago

Panzerschrek commented 8 months ago

See https://swtch.com/~rsc/regexp/regexp1.html

This approach works great for long sequences without necessity to rescan input strings multiple times. But it it doesn't seem to work for regular expressions with backreferences.

It's also possible to build a DFA from initial regex NFA and execute it, but it's unclear, how it should work for some cases with greedy search, like [a-z]*q, where the longest possible sequence should be found. Also it may be difficult to handle sequences with specified min/max number of iterations.

Panzerschrek commented 8 months ago

https://www.pcre.org/original/doc/html/pcrematching.html#SEC4 A description of the same algorithm in the PCRE library. It is also limited - no backreferences, no submatches, no conditionals, etc.

But it is still may be useful to implement this algorithm for regular expressions without usage of these features.