BeRo1985 / flre

FLRE - Fast Light Regular Expressions - A fast light regular expression library
GNU Lesser General Public License v2.1
94 stars 23 forks source link

Probably a bug with ?! quantifier #50

Closed Keksov closed 5 years ago

Keksov commented 5 years ago

Consider the following input string: 'A0 B1 A2 B2' regexp - search for A or B not followed by Zero: ([AB](?!0))

re := TFLRE.Create( '([AB](?!0))', [ rfUTF8 ] ); // dump - (([A-B](?:(?!0))))
newString := re.UTF8ReplaceCallback( UTF8String( 'A0 B1 A2 B2' ), myCalBack  )

function myCalBack : string
begin
    Result := '#';
end;

As expected, myCalBack invoked with Input = 'B1 A2 B2', Start = 0, Length = 1, etc. and as correct result newString ='A0 #1 #2 #2'

Let's modify re a little bit: ([AB](?![0])) - add [] around 0 dump - (([A-B](?:(?![0])))) Should work as before, right? Nope... now myCalBack invoked with Input = 'A0 B1 A2 B2', Start = 0, Length = 1 - means match = first A, as a result newString = '#0 #1 #2 #2'

Looks like ?! doesn't accept ranges [], but does work with literals only. Or did I miss some special flag for Create?

BeRo1985 commented 5 years ago

See description in the README.txt:

And as a side note, the experimental support for lookahead assertions and back references may be removed later again, if it is recognized later that these capabilities are not effective or not issue-free without backtracking, because FLRE is primarly a backtracking-free regular expression engine.

In other words:

(?!) doesn't support sub nested lookahead regular expressions at FLRE, since it would be only compatible with backtracking-based regular expression execution.

See https://swtch.com/~rsc/regexp/regexp1.html and https://swtch.com/~rsc/regexp/regexp2.html and https://swtch.com/~rsc/regexp/regexp3.html for more informations on this context.

Keksov commented 5 years ago

OK, I see... Thank you! Issue closed... alas.

Keksov commented 5 years ago

I think it's supported by BRRE, right?

BeRo1985 commented 5 years ago

Yes, but you should definitely avoid BRRE because it has very critical design concept-based (mostly DFA-technical-side) bugs which can lead to very obvious false positives and very obvious false negatives. In another words: BRRE is buggy as hell, at least when DFA is enabled.

This is also the reason why FLRE exists at all, because BRRE concept-design-technically (at trying intermixing fully different regular expression matching algorithm concepts) became a total hopeless case and total buggy. The core concept idea is FLRE: It must also working completely without backtracking in any case (small exception: lookaheads etc. with threaded NFA), for to avoid so such code-design pitfalls and issues again.

BeRo1985 commented 5 years ago

Or in another words:

FLRE allows morely the Keep-It-Stupid-Simple (KISS) idea, while BRRE had followed morely the everything-implemented-what-is-possible-idea, which in the end was doomed to failure with the various bugs.