andgineer / TRegExpr

Regular expressions (regex), pascal.
https://regex.sorokin.engineer/en/latest/
MIT License
174 stars 63 forks source link

Optional Feature: backtracking is sub-calls #375

Open User4martin opened 10 months ago

User4martin commented 10 months ago
  IsMatching('Recursive', '^(?''A''(?:b[^b]+(?&A)?x+))', 'baaabaaaaaxx', [1,12,  1,12]);

https://regex101.com/r/g3pDSW/1

^(?'A'
  (?:
    b[^b]+
    (?&A)?  ## This sub-call will match ALL the "x", but when the outer match fails, it backtracks, and matches only ONE "x"
     x+
  )
)

TRegExpr fails to match, because it does not backtrack the sub-call.

User4martin commented 10 months ago

It appears that recursion (rather than sub-call) (?R) or (?0)` does not backtrack.

https://regex101.com/r/T2Il6J/1 (?:b[^bx]+((?R))?x+) on `baaabaaaxx'

If it did backtrack, then IMHO it should match the whole pattern, as it does if the + after the x+ is removed: https://regex101.com/r/uEG8kF/1

There is a mention of "recursion being atomic" here https://www.rexegg.com/regex-recursion.html

User4martin commented 10 months ago

Actually, changing the regex a bit ^(?'A'(?:b[^bx]+(?&A)?x+)) https://regex101.com/r/wwuP8H/1

User4martin commented 10 months ago

I would suggest to keep that as a reminder for an optional feature.

Alexey-T commented 10 months ago

Martin, I am out of code knowledge here. so do the change as you wish. please test it.

User4martin commented 10 months ago

If it is ok, for now just keep the issue open. Needs more research how different engines handle it, and if there is documentation. Then maybe a property controlled behaviour: current or as pcre7.3.

Solving this, would need changes that would also remove the 20 levels of recursion.

I don't plan any immediate work on it... But might later go for it.