Closed p5pRT closed 13 years ago
Given this program: ### BEGIN use strict; use warnings;
my $res = ('foo' =~ qr(\Qfoo\E)); print "result: "\, ($res ? '' : "not ")\, "ok\n";
my $ptn = '\Qblah\E'; my $ptnObject = qr($ptn); # line 8 $res = ('blah' =~ $ptnObject); print "result: "\, ($res ? '' : "not ")\, "ok\n";
$ptnObject = eval('qr(' . $ptn . ')'); $res = ('blah' =~ $ptnObject); print "result: "\, ($res ? '' : "not ")\, "ok\n"; ### END
The output is ok not ok ok
and I get this error message (twice):
Unrecognized escape \Q passed through in regex; marked by \<-- HERE in m/\Q \<-- HERE blah\E/ at C:\Users\ericp\trash\slashq.pl line 8. result: ok result: not ok Unrecognized escape \E passed through in regex; marked by \<-- HERE in m/\Qblah\E \<-- HERE / at C:\Users\ericp\trash\slashq.pl line 8.
In other words\, I can't construct a regex containing '\Q...\E' indirectly. Does this mean I should always use the third form of constructing the regex (using eval) rather than the qr{$string} form?
On Tue\, Oct 19\, 2010 at 02:33:56PM -0700\, Eric Promislow wrote:
my $ptn = '\Qblah\E'; my $ptnObject = qr($ptn); # line 8 $res = ('blah' =~ $ptnObject); print "result: "\, ($res ? '' : "not ")\, "ok\n";
In other words\, I can't construct a regex containing '\Q...\E' indirectly. Does this mean I should always use the third form of constructing the regex (using eval) rather than the qr{$string} form?
I'm not quite sure what you're trying to do. Where is the value for $ptn coming from in your actual code?
Ronald
The RT System itself - Status changed from 'new' to 'open'
Here's the flow from the actual code:
use strict; use warnings;
use JSON 2.0 qw(from_json to_json); ## no critic (ProhibitStringyEval)
package Evaluator; #... sub _compile { # Convert post-pattern options to pre-pattern options like # (?i)(?x)... # and then use qr// to eval the whole thing. my $self = shift; my $optionsPrefix = $self->{options}; $optionsPrefix =~ s/(.)/(?$1)/g; my $pattern = $optionsPrefix . $self->{pattern}; return qr{$pattern}; }
sub init { my $self = shift; $self->{regex} = $self->_compile(); }
package main;
sub main {
my $requestString = shift || \
unless (caller) { my $responsePacket = main(); my $jsonResult = JSON::to_json($responsePacket\, { ascii => 1 }); print $jsonResult; }
perl process starts up\, and reads a JSON packet representing a regex to be eval'ed from a GUI. There's one field for textInput\, one for regex\, one for operation ("match"\, "replace"\, etc.)\, one for options ("[ismx]*"). So I can't construct a regex from "\Q...\E" directly -- I have to put it in a variable\, and build the regex object via qr($variable).
Hope this makes sense. Thanks for the quick contact.
BTW I haven't found a bug in Perl in years.... But I'm pretty sure this qualified.
- Eric
On 10/20/2010 2:59 PM\, Ronald J Kimball via RT wrote:
On Tue\, Oct 19\, 2010 at 02:33:56PM -0700\, Eric Promislow wrote:
my $ptn = '\Qblah\E'; my $ptnObject = qr($ptn); # line 8 $res = ('blah' =~ $ptnObject); print "result: "\, ($res ? '' : "not ")\, "ok\n";
In other words\, I can't construct a regex containing '\Q...\E' indirectly. Does this mean I should always use the third form of constructing the regex (using eval) rather than the qr{$string} form?
I'm not quite sure what you're trying to do. Where is the value for $ptn coming from in your actual code?
Ronald
On Tue\, Oct 19\, 2010 at 11:33 PM\, Eric Promislow \perlbug\-followup@​perl\.org wrote:
In other words\, I can't construct a regex containing '\Q...\E' indirectly. Does this mean I should always use the third form of constructing the regex (using eval) rather than the qr{$string} form?
Changing $ptn from single quotes to double quotes seems to be the solution to your issue. Turns out this is even documented in perlop ("The following escape sequences are available in constructs that interpolate…")\, though I must admit I'm a bit surprised by this behavior too.
Leon
On 10/20/2010 4:39 PM\, fawaka@gmail.com via RT wrote:
On Tue\, Oct 19\, 2010 at 11:33 PM\, Eric Promislow \perlbug\-followup@​perl\.org wrote:
In other words\, I can't construct a regex containing '\Q...\E' indirectly. Does this mean I should always use the third form of constructing the regex (using eval) rather than the qr{$string} form?
Changing $ptn from single quotes to double quotes seems to be the solution to your issue. Turns out this is even documented in perlop ("The following escape sequences are available in constructs that interpolate…")\, though I must admit I'm a bit surprised by this behavior too.
I read that too\, and from my understanding\, qr{...} interpolates\, like qq{...}\, while q{...} doesn't. But I obviously misread the perldoc here.
But now I get it. \Q...\E have nothing to do with pattern-matching\, and everything to do with string interpolation. I'm out of luck (or my customers are)\, because there are no string literals in this case\, only input. And wrapping it in an \<\<eval("...")>> would be wrong\, because it could break other people's work.
Thanks.
Not a bug.
- Eric
Leon
So I can't construct a regex from "\Q...\E" directly -- I have to put it in a variable\, and build the regex object via qr($variable).
BTW I haven't found a bug in Perl in years.... But I'm pretty sure this qualified.
Not truly. It may seem that way\, but it's all working as designed and documented. What you're asking for is an enhancement request\, not a bug fix. And I don't think we can reasonably do that at this stage in the game.
Since its genesis\, Perl has stood steadfastly in the camp of evaluating once and once only\, unless you take special measures to have it do otherwise. This was an intentional break from the multiple levels of expansion the various shells apply\, often seemingly willy-nilly and unpredictably. Perl was supposed to be predictable and WYSIWIG.
That's why there's a difference between $foo and $$foo and $$$foo\, or s/// and s///e and s///ee. You have to look rather hard to find anywhere at all that Perl applies some extra level of automatic "interpolation". I know of surprisingly few of those\, and this is not one of them. The case-translation escapes have always been documented to occur during the variable- substitution phase only\, and never outside that. They have never been something the regex compiler has understood.
From pp. 162-163 of _Programming Perl_\, 3rd edition:
Note that the case and metaquote translation escapes (C\<\U> and
friends) must be processed during the variable interpolation pass
because the purpose of those metasymbols is to influence how variables
are interpolated. If you suppress variable interpolation with single
quotes\, you don't get the translation escapes either. Neither
variables nor translation escapes (C\<\U>\, etc.) are expanded in any
single quoted string\, nor in single-quoted C\<m'...'> or C\<qr'...'>
operators. Even when you do interpolation\, these translation escapes
are ignored if they show up as the I\
From pp. 192-193 of _Programming Perl_\, 3rd edition:
The only double-quote escapes that are processed as such are the six translation escapes: C\<\U>\, C\<\u>\, C\<\L>\, C\<\l>\, C\<\Q>\, and C\<\E>. If you ever look into the inner workings of the Perl regular expression compiler\, you'll find code for handling escapes like C\<\t> for tab\, C\<\n> for newline\, and so on. But you won't find code for those six translation escapes. (We only listed them in A\<CHP-5-TABLE-3> because people expect to find them there.) If you somehow manage to sneak any of them into the pattern without going through double-quotish evaluation\, they won't be recognized.
How could they find their way in? Well\, you can defeat interpolation by using single quotes as your pattern delimiter. In C\<m'...'>\, C\<qr'...'>\, and C\<s'...'...'>\, the single quotes suppress variable interpolation and the processing of translation escapes\, just as they would in a single-quoted string. Saying C\<m'\ufrodo'> won't find a capitalized version of poor frodo. However\, since the "normal" backslash characters aren't really processed on that level anyway\, C\<m'\t\d'> still matches a real tab followed by any digit.
Another way to defeat interpolation is through interpolation itself. If you say:
$var = '\U'; /${var}frodo/;
poor frodo remains uncapitalized. Perl won't redo the interpolation pass for you just because you interpolated something that looks like it might want to be reinterpolated. You can't expect that to work any more than you'd expect this double interpolation to work:
$hobbit = 'Frodo'; $var = '$hobbit'; # (single quotes) /$var/; # means m'$hobbit'\, not m'Frodo'.
Here's another example that shows how most backslashes are interpreted
by the regex parser\, not by variable interpolation. Imagine you have a
simple little I\
#!/usr/bin/perl $pattern = shift; while (\<>) { print if /$pattern/o; }
If you name that program I\
% pgrep '\t\d' *.c
then you'll find that it prints out all lines of all your C source
files in which a digit follows a tab. You didn't have to do
anything special to get Perl to realize that C\<\t> was a tab.
If Perl's patterns I\
In Java\, beginning with 1.5\, its Pattern class understands \Q ... \E within strings. You may perhaps be thinking Perl works like Java in this regard\, but as I hope the text quoted above explains\, it really does not.
If your point is that the standard documentation that Perl ships with is unclear or even misleading in this regard\, there may be some substance to that complaint. Things like perlreref.pod are one of the culprits here\, where under ESCAPE SEQUENCES for regular expressions\, they claim:
\l Lowercase next character \u Titlecase next character \L Lowercase until \E \U Uppercase until \E \Q Disable pattern metacharacters until \E \E End modification
This is completely wrong. Those should not be there\, because they are *not* there. The only backslash escapes with meaning to the regular expression compiler are those operative under
$string =~ $pattern
That you can use case-translation escapes in
$string = qq{$pattern \Q$literal\E $more_pattern};
or
$string =~ m{$pattern \Q$literal\E $more_pattern};
or
$regex = qr{$pattern \Q$literal\E $more_pattern};
are artifacts of the variable-expansion phase. They do not happen in:
$string =~ $pattern
because the regex compiler has no earthly idea what a \Q even means.
It doesn't mean anything to it. It's an unrecognized escape.
That's why this prints "yes":
print "qx" =~ m'\A\Q.\z'i ? "yes\n" : "no\n";
If you turn on warnings\, you learn
Unrecognized escape \Q passed through in regex; marked by \<-- HERE in m/^\Q \<-- HERE .$/
And if you use C\<\< use re "debug"; >> or C\<\< -Mre=debug >>\, you'll get:
Compiling REx "\A\Q.\z"
Final program:
1: SBOL (2)
2: EXACTF \ (4)
4: REG_ANY (5)
5: EOS (6)
6: END (0)
anchored ""$ at 2 stclass EXACTF \
anchored(SBOL) minlen 2
Matching REx "\A\Q.\z" against "qx"
0 \<> \
(4)
1 \
\
See what's actually happening?
So this isn't a bug in Perl\, but it may be a bug in the online documentation kit. I haven't scoured those pods for other treatments of this topic beyond the one I pointed out as being misrepresented in perlreref.pod. That one should be fixed\, and anything similar hunted down and killed. Our apologies.
I hope you find the Camel text\, at least\, clear enough about all this that it makes sense to you now.
--tom
On Thu\, Oct 21\, 2010 at 1:10 AM\, Eric Promislow \ericp@​activestate\.com wrote:
perl process starts up\, and reads a JSON packet representing a regex to be eval'ed from a GUI. There's one field for textInput\, one for regex\, one for operation ("match"\, "replace"\, etc.)\, one for options ("[ismx]*"). So I can't construct a regex from "\Q...\E" directly -- I have to put it in a variable\, and build the regex object via qr($variable).
Hope this makes sense. Thanks for the quick contact.
It's still not clear to me why you want this in the first place\, or otherwise why the quotemeta builtin can't do what you want to do.
Leon
On 10/20/2010 6:06 PM\, Leon Timmermans wrote:
On Thu\, Oct 21\, 2010 at 1:10 AM\, Eric Promislow\ericp@​activestate\.com wrote:
perl process starts up\, and reads a JSON packet representing a regex to be eval'ed from a GUI. There's one field for textInput\, one for regex\, one for operation ("match"\, "replace"\, etc.)\, one for options ("[ismx]*"). So I can't construct a regex from "\Q...\E" directly -- I have to put it in a variable\, and build the regex object via qr($variable).
Hope this makes sense. Thanks for the quick contact.
It's still not clear to me why you want this in the first place\, or otherwise why the quotemeta builtin can't do what you want to do.
Leon
It's for Perl mode for Komodo's Rx Toolkit:
http://bugs.activestate.com/show_bug.cgi?id=82715
I had always assumed that \Q...\E was for pattern-matching\, but it isn't. I stand corrected\, and hope the customer who made that request does as well.
- Eric
On Wed\, Oct 20\, 2010 at 7:39 PM\, Leon Timmermans \fawaka@​gmail\.com wrote:
On Tue\, Oct 19\, 2010 at 11:33 PM\, Eric Promislow \perlbug\-followup@​perl\.org wrote:
In other words\, I can't construct a regex containing '\Q...\E' indirectly. Does this mean I should always use the third form of constructing the regex (using eval) rather than the qr{$string} form?
Changing $ptn from single quotes to double quotes seems to be the solution to your issue.
Not the same thing
perl -E"say qr/\Q\x30/ (?-xism:\\x30)
perl -E"say qq/\Q\x30/ 0
On Thu\, Oct 21\, 2010 at 3:35 AM\, Eric Promislow \ericp@​activestate\.com wrote:
It's for Perl mode for Komodo's Rx Toolkit:
http://bugs.activestate.com/show_bug.cgi?id=82715
I had always assumed that \Q...\E was for pattern-matching\, but it isn't. I stand corrected\, and hope the customer who made that request does as well.
You could emulate the parser with something like this:
my %action = ( L => sub { lc $_[0] }\, Q => sub { quotemeta $_[0] }\, U => sub { uc $_[0] }\, ); s/ \\([LUQ]) (.*?) (?:\\E|\z) / $action{$1}->($2) /xesg;
It doesn't handle overlapping \[LUG]'s\, but those should be uncommon anyway.
Leon
On Thu\, Oct 21\, 2010 at 5:10 AM\, Leon Timmermans \fawaka@​gmail\.com wrote:
On Thu\, Oct 21\, 2010 at 3:35 AM\, Eric Promislow \ericp@​activestate\.com wrote:
It's for Perl mode for Komodo's Rx Toolkit:
http://bugs.activestate.com/show_bug.cgi?id=82715
I had always assumed that \Q...\E was for pattern-matching\, but it isn't. I stand corrected\, and hope the customer who made that request does as well.
You could emulate the parser with something like this:
my %action = ( L => sub { lc $_[0] }\, Q => sub { quotemeta $_[0] }\, U => sub { uc $_[0] }\, ); s/ \\([LUQ]) (.*?) (?:\\E|\z) / $action{$1}->($2) /xesg;
Fails if the string contains that leading "\" was escaped.
my %action = ( L => sub { lc $_[0] }\, Q => sub { quotemeta $_[0] }\, U => sub { uc $_[0] }\, ); s/ \\([LUQ]) (.*?) (?:\\E|\z) / $action{$1}->($2) /xesg;
Fails if the string contains that leading "\" was escaped.
That's why we invented (?\<!pattern).
On Fri\, Oct 22\, 2010 at 3:53 PM\, Matt Sergeant \matt@​sergeant\.org wrote:
my %action = (> L => sub { lc $_[0] }\,> Q => sub { quotemeta $_[0] }\,> U => sub { uc $_[0] }\,> );> s/ \\([LUQ]) (.*?) (?:\\E|\z) / $action{$1}->($2) /xesg;>
Fails if the string contains that leading "\" was escaped.
That's why we invented (?\<!pattern).
Not only did henot use (?\<!pattern)\, it doesn't help since it cannot determine whether the "Q" was preceded by an even or an odd number of slashes.
Not only did henot use (?\<!pattern)\, it doesn't help since it cannot determine whether the "Q" was preceded by an even or an odd number of slashes.
This better? ;-)
s/ (?\<!\\) (?>\\\\)* \K \\([LUQ]) (.*?) (?:\\E|\z) / $action{$1}->($2) /xesg;
@rgs - Status changed from 'open' to 'rejected'
Migrated from rt.perl.org#78456 (status was 'rejected')
Searchable as RT78456$