VeriFIT / mata

A fast and simple automata library
MIT License
23 stars 13 forks source link

Construct NFAs for regex matching inside text #464

Open Adda0 opened 1 day ago

Adda0 commented 1 day ago

As of #459, we skip ^, $, \b etc. in regexes, as they are irrelevant for our precise regex matching NFAs (accept only the specified regex and nothing more).

However, regex a{2}b can have two interpretations: it should match aab and only aab, but it can also match aab inside fffaabfff. The first approach is just an automaton matching a{2}b precisely, the other is .*a{2}b.*, which is what normal regex matchers do. We should have a flag (by default, set to the first approach), where the user can define which matching approach they want (what kind of NFA they get from the regex). Then, the ^ and $ will play a role. In the first approach, they are irrelevant, in the second, they must be accounted for.

Originally posted by @Adda0 in https://github.com/VeriFIT/mata/issues/459#issuecomment-2482274416

The EndOfLine vs EndOfText could be related with whether multi-line mode is enabled or not, by default I think it is disabled.

Originally poster by @jurajsic in https://github.com/VeriFIT/mata/pull/459#issuecomment-2482340805.