andgineer / TRegExpr

Regular expressions (regex), pascal.
https://regex.sorokin.engineer/en/latest/
MIT License
174 stars 63 forks source link

Nested back-ref does not work #376

Closed User4martin closed 10 months ago

User4martin commented 10 months ago

A back-ref \1 matches the same literal text as was matched by the most recent successful capture of the group.

That is, if the group is matched in a loop, then it will be the text matched in the most recent iteration.

But

  IsMatching('nested ref', '(?i)^(.(?:(\1)|(.)))*',  'aAaaAa',  [1,5,  3,3, 4,2, 2,1]);

The \1 is within the group 1.

https://regex101.com/r/sstaLI/1

In TRegExpr it does not.

Alexey-T commented 10 months ago

Do all / most engines support \1 inside group1 ? it sounds scary. maybe it's rare feature.

User4martin commented 10 months ago

If they support \1 at all, then they supported it nested too.

ECMA handles it different (independent of nested or not) Given (?: (?:\1|\d) (\w) )*

In the first loop the capture 1 (\w) has not yet been maatched.

Same holds true in the nested case. Which really is no different, from a users view. It only is different for the implementation, because the new match must be written in one go only when the ) is reached, so the last match is valid until the next is fully avail.