Closed NWilson closed 2 weeks ago
In Perl, I get the following diagnostic for the regex /[z-\p{Lu}]/
:
False [] range "z-\p{Lu}" in regex; marked by <-- HERE in m/[z-\p{Lu} <-- HERE ]/
In general, @PhilipHazel, what is the correct place to put an erroroffset
returned by pcre2_compile()
? Should be it just before, or just after, the error? In Perl, the idea is that inserting "<-- HERE" into the string (at the error offset) makes sense, so that suggests the error offset should be pointing to the next (unread) character after the error.
But in quite a few places in PCRE2, there seems to be a convention that the erroroffset
actually points to the bad character, which suggests inserting "HERE ->" at the error offset, when displaying it to the user.
Is there a consistent answer?
I have some additional fixes as requested by Carlo, but I'll push them when I know where the correct position of the offset is.
Is there a consistent answer?
Not in the sense of my having any "policy" about this. I guess I just returned wherever the pointer happened to be.
Thank you @PhilipHazel. I have a branch where I've hacked up perltest.sh a little bit, to gather error offsets from Perl's errors and warnings, and use them in RunPerlTest to check all our errors in testoutput{2,5}
match the error offsets from Perl.
It's going to be somewhat ... epic in scope to pull out all those errors though. So I'm parking that investigation.
In the absence of any policy, I'll just adjust the errors offsets in this PR for [\d ... \p{...} ... [:posix:]]
in character classes to match Perl. The error offset will make sense if you insert "<-- HERE" at the offset position.
In future we can maybe make that adjustment for some other error cases, and try and produce the same offsets as Perl in as many cases as possible.
I don't remember anybody ever complained about error offsets, so whatever change is good for me. When the patch is ready, I will land it.
Thank you. I've pushed with an updated commit message for Carlo; no other changes.
I'm satisfied with this set of changes for now.
The code does a 1-char lookahead for a hyphen, but then doesn't advance the pointer to consume the hyphen when returning the error.
Perl's error message (with "use warnings") does advance to just after the hyphen, so PCRE2 should match.
Fixes #545