andgineer / TRegExpr

Regular expressions (regex), pascal.
https://regex.sorokin.engineer/en/latest/
MIT License
174 stars 63 forks source link

Incorrect sub-calls #329

Closed User4martin closed 1 year ago

User4martin commented 1 year ago

Regex (1(2(3(?1)?))A)_(?3) Text 123A_3123

should match 123A_3 https://regex101.com/r/1CvVT2/1

Rationale:

It wrongly returns 123A_3123 It does not require the mandatory A

The call to (?1) contains another run of the 3rd capture (inlined in 1, not called by 1). When the OP_CLOSE of the 3rd capture is reached, it believes this to be the "called (?3)" (which it is not). At this point it falsely returns from the call (therefore not requiring the A)

Alexey-T commented 1 year ago

(1(2(3(?1)?))A)

here it seems - (?1) must never match. because (?1) is included in 1st group and it calls 1st group. seems, this is bad regex. which must match only with zero repeat-counter of (?1).

Alexey-T commented 1 year ago

so, maybe we can include the check - sub call must never call surrounding group. in compiler.

User4martin commented 1 year ago

Added fix to #328

User4martin commented 1 year ago

(1(2(3(?1)?))A)

here it seems - (?1) must never match. because (?1) is included in 1st group and it calls 1st group. seems, this is bad regex. which must match only with zero repeat-counter of (?1).

Add an "A" to the end of the text, and it will match

https://regex101.com/r/1dcWlU/1

User4martin commented 1 year ago

An outer caller, can only return after any callee (sub call) has returned.

So, we only need to check OP_CLOSE against the most inner called index. Any outer caller index is temp saved, until all inner call have returned.

Alexey-T commented 1 year ago

Add an "A" to the end of the text, and it will match

and how it opposes to my comment? it does not.

User4martin commented 1 year ago

(1(2(3(?1)?))A)

here it seems - (?1) must never match.

Add an "A" and (?1) does match.

because (?1) is included in 1st group and it calls 1st group. seems, this is bad regex. which must match only with zero repeat-counter of (?1).

It is not (a bad regex). It is a recursive call. Just a bit more contrived than directly calling itself.

And since it has a ? it can exit the recursion.


EDIT https://regex101.com/r/1dcWlU/1

After the _ the regex still matches 3123A

Alexey-T commented 1 year ago

Ah, recursive call, now i got idea of that regex.

Alexey-T commented 1 year ago

@User4martin It's solved now, let's close?