Closed andersweister closed 2 months ago
Hi @andersweister,
Can you let me know what you're using that gives you errors about the -
at the end/beginning of a character class?
I can't seem to make the vanilla stuff croak.
PCRE2 version 10.43 2024-02-16 (8-bit)
re> /[a-z-]+/
data> testing-this
0: testing-this
Python 3.8.10 (default, Nov 22 2023, 10:22:35)
Type "help", "copyright", "credits" or "license" for more information.
>>> import re; r = re.compile("[a-z-]+"); r.match("testing-this")
<re.Match object; span=(0, 12), match='testing-this'>
"testing-this".match(/[a-z-]+/)
0: "testing-this"
groups: undefined
index: 0
input: "testing-this"
Escape the hyphen using - as it usually used for character range inside character set. https://stackoverflow.com/questions/34916716/regular-expression-to-match-alphanumeric-hyphen-underscore-and-space-string
Unfortunately Regex is not a universally agreed upon standard. It's a tool that evolves differently in different times and programming languages. People call them "flavors", which is quite the apt description.
So yes, I agree that it gets confusing. Most people look up the regex documentation for the particular tool they use... which hopefully saves some time and headache.
As far as the flavors supported on regex101.com, they act/behave according to their own documentation, where -
can be used at the beginning or end of a character class without the need to escape it with a \
or another -
. You can still use \-
if you so desire, which then no longer has the requirement to be the first or last character in the character class.
If you would like to discuss this further please feel free to comment here, or join us on IRC or discord.
Bug Description
The "-" is used for range, so it must be escaped when matching the character itself.
These four characters require escape sequence inside the bracket list: ^, -, ], .
Reproduction steps
[a-zA-Z0-9-]+ notice the un-escaped "-" at the end, which is illegal in major implementations.
Test string: pqr-456
Unfortunately gives no warning and accepts the test string.
Expected Outcome
The following is correct: [a-zA-Z0-9\-]+ one back-slash before the last hyphen (doubled just for this markdown source).
Reference: https://www3.ntu.edu.sg/home/ehchua/programming/howto/Regexe.html
Browser
Chromium Version 123.0.6312.105
OS
Linux Ubuntu 20.04.6 LTS