firasdib / Regex101

This repository is currently only used for issue tracking for www.regex101.com
3.26k stars 199 forks source link

Python flavor doesn't reject nested backreferences, which Python itself does #1546

Open Davidebyzero opened 3 years ago

Davidebyzero commented 3 years ago

Bug Description

Python doesn't support nested backreferences. For example, /^(^x|\1xx)*$/ cannot be used to match strings of x with the length of a perfect square, because it references \1 inside capture group 1. Python responds to an attempt to do this with the error cannot refer to an open group. (Note that this can be worked around by emulating them using forward-declared backreferences, for example /^((?=(^x|\3xx))(\2))*$/.) This applies to both the re and regex modules.

Regex101 does not block nested backreferences from being used when the Python flavor is selected.

Reproduction steps

https://regex101.com/r/9ae0S3/1 - this should fail to work at all, with all unit tests fail, but instead all the tests pass.

Ouims commented 3 years ago

add the python label because python is still being emulated by pcre1 (possibly pcre2 now?), there are various issues with the label python which are related to side handling of it via pcre, some of them (all?) may be fixable but i think a more general solution is getting considered.

rootsmusic commented 9 months ago

@Ouims According to FAQ: "For Python, regex101 implements it on top of PCRE library, by suppressing features not available in Python."