PCRE2Project / pcre2

PCRE2 development is now based here.
Other
919 stars 191 forks source link

Inconsistent matching with repeated backreferences and match_unset_backref #335

Closed addisoncrump closed 1 year ago

addisoncrump commented 1 year ago

Discovered by #322.

The following regex demonstrates the issue:

  re> /(a)|\1+/match_unset_backref
data> ba
 0: a
 1: a
data> ba\=no_jit
 0: 

I believe a similar, related case is the following:

  re> /(a)|\1+/match_unset_backref
data> bbbb
No match
data> bbbb\=no_jit
 0: 

What's very curious is that this does not appear without the repetition:

  re> /(a)|\1/match_unset_backref
data> ba
 0: 
data> ba\=no_jit
 0:
data> a
 0: a
 1: a
data> a\=no_jit
 0: a
 1: a

Finally, it appears with fixed repetitions, but not range repetitions:

  re> /(a)|\1{128}/match_unset_backref
data> ba
 0: a
 1: a
data> ba\=no_jit
 0:
data>
  re> /(a)|\1{,128}/match_unset_backref
data> ba
 0: 
data> ba\=no_jit
 0: 
data> 

This implies to me that there is some issue with how the JIT handles repetitions of empty backreferences.

zherczeg commented 1 year ago

Fixed in 936fef2