Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.88k stars 531 forks source link

\Q and \E can't be used indirectly in a qr string #10741

Closed p5pRT closed 13 years ago

p5pRT commented 13 years ago

Migrated from rt.perl.org#78456 (status was 'rejected')

Searchable as RT78456$

p5pRT commented 13 years ago

From ericp@ActiveState.com

Created by ericp@activestate.com

Given this program​: ### BEGIN use strict; use warnings;

my $res = ('foo' =~ qr(\Qfoo\E)); print "result​: "\, ($res ? '' : "not ")\, "ok\n";

my $ptn = '\Qblah\E'; my $ptnObject = qr($ptn); # line 8 $res = ('blah' =~ $ptnObject); print "result​: "\, ($res ? '' : "not ")\, "ok\n";

$ptnObject = eval('qr(' . $ptn . ')'); $res = ('blah' =~ $ptnObject); print "result​: "\, ($res ? '' : "not ")\, "ok\n"; ### END

The output is ok not ok ok

and I get this error message (twice)​:

Unrecognized escape \Q passed through in regex; marked by \<-- HERE in m/\Q \<-- HERE blah\E/ at C​:\Users\ericp\trash\slashq.pl line 8. result​: ok result​: not ok Unrecognized escape \E passed through in regex; marked by \<-- HERE in m/\Qblah\E \<-- HERE / at C​:\Users\ericp\trash\slashq.pl line 8.

In other words\, I can't construct a regex containing '\Q...\E' indirectly. Does this mean I should always use the third form of constructing the regex (using eval) rather than the qr{$string} form?

Perl Info ``` Flags: category=core severity=medium Site configuration information for perl 5.10.0: Configured by SYSTEM at Wed Sep 3 13:16:08 2008. Summary of my perl5 (revision 5 version 10 subversion 0) configuration: Platform:f osname=MSWin32, osvers=5.00, archname=MSWin32-x86-multi-thread uname='' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cl', ccflags ='-nologo -GF -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D_CONSOLE -DNO_STRICT -DHAVE_DES_FCRYPT -DUSE_SITECUSTOMIZE -DPRIVLIB_LAST_IN_INC -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX', optimize='-MD -Zi -DNDEBUG -O1', cppflags='-DWIN32' ccversion='12.00.8804', gccversion='', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -libpath:"C:\Perl\lib\CORE" -machine:x86' libpth=\lib libs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib perllibs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl510.lib gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf -libpath:"C:\Perl\lib\CORE" -machine:x86' Locally applied patches: ACTIVEPERL_LOCAL_PATCHES_ENTRY 33741 avoids segfaults invoking S_raise_signal() (on Linux) 33763 Win32 process ids can have more than 16 bits 32809 Load 'loadable object' with non-default file extension 32728 64-bit fix for Time::Local @INC for perl 5.10.0: C:/Perl/site/lib C:/Perl/lib . Environment for perl 5.10.0: HOME=C:\Users\ericp LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=C:\Program Files\ActiveState Komodo Edit 6\;C:\Program Files\ActiveState Komodo IDE 6\;C:\Ruby186\bin;C:\Program Files\ActiveState Perl Dev Kit 9.0\bin\;C:\apps\oraclexe\app\oracle\product\10.2.0\server\bin;C:\Program Files\Autodesk\Maya2009\bin;C:\Program Files\ActiveState Komodo IDE 5.2\;C:\Python26\\Scripts;C:\Python26\;C:\Program Files\ActiveState Komodo Edit 5\;C:\apps\php-5.3.0-nts-vc9\;C:\apps\php-5.3.0-nts\;C:\apps\PHP-5.3\;C:\Python26;C:\Python26\scripts;C:\Program Files\ActiveState Komodo IDE 5.1;C:\Program Files\ActiveState Perl Dev Kit 8.0.1\bin;C:\Perl\site\bin;C:\Perl\bin;c:\apps\svn\bin;c:\bin;c:\Users\ericp\bin;c:\msys\bin;C:\apps\p4;C:\TclDevKit\bin;C:\Tcl\bin;c:\apps\git\bin;C:\Users\ericp\svn\apps\komodo\util\black;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Program Files\Microsoft SQL Server\90\Tools\binn;C:\Program Files\MySQL\MySQL Server 5.0\bin;C:\Program Files\QuickTime\QTSystem;C:\Program Files\Windows Resource Kits\Tools\;C :\Program Files\Common Files\Autodesk Shared;C:\Users\ericp\AppData\Roaming\Python\Scripts;C:\apps\CVSNT;c:\apps\PostgreSQL\8.4\bin PERL_BADLANG (unset) SHELL (unset) ```
p5pRT commented 13 years ago

From @tamias

On Tue\, Oct 19\, 2010 at 02​:33​:56PM -0700\, Eric Promislow wrote​:

my $ptn = '\Qblah\E'; my $ptnObject = qr($ptn); # line 8 $res = ('blah' =~ $ptnObject); print "result​: "\, ($res ? '' : "not ")\, "ok\n";

In other words\, I can't construct a regex containing '\Q...\E' indirectly. Does this mean I should always use the third form of constructing the regex (using eval) rather than the qr{$string} form?

I'm not quite sure what you're trying to do. Where is the value for $ptn coming from in your actual code?

Ronald

p5pRT commented 13 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 13 years ago

From ericp@ActiveState.com

Here's the flow from the actual code​:

use strict; use warnings;

use JSON 2.0 qw(from_json to_json); ## no critic (ProhibitStringyEval)

package Evaluator; #... sub _compile {   # Convert post-pattern options to pre-pattern options like   # (?i)(?x)...   # and then use qr// to eval the whole thing.   my $self = shift;   my $optionsPrefix = $self->{options};   $optionsPrefix =~ s/(.)/(?$1)/g;   my $pattern = $optionsPrefix . $self->{pattern};   return qr{$pattern}; }

sub init {   my $self = shift;   $self->{regex} = $self->_compile(); }

package main;

sub main {   my $requestString = shift || \;   local $@​;   my $evaluator;   eval {   $evaluator = new Evaluator($requestString);   $evaluator->init();   };   #... }

unless (caller) {   my $responsePacket = main();   my $jsonResult = JSON​::to_json($responsePacket\, { ascii => 1 });   print $jsonResult; }

perl process starts up\, and reads a JSON packet representing a regex to be eval'ed from a GUI. There's one field for textInput\, one for regex\, one for operation ("match"\, "replace"\, etc.)\, one for options ("[ismx]*"). So I can't construct a regex from "\Q...\E" directly -- I have to put it in a variable\, and build the regex object via qr($variable).

Hope this makes sense. Thanks for the quick contact.

BTW I haven't found a bug in Perl in years.... But I'm pretty sure this qualified.

- Eric

On 10/20/2010 2​:59 PM\, Ronald J Kimball via RT wrote​:

On Tue\, Oct 19\, 2010 at 02​:33​:56PM -0700\, Eric Promislow wrote​:

my $ptn = '\Qblah\E'; my $ptnObject = qr($ptn); # line 8 $res = ('blah' =~ $ptnObject); print "result​: "\, ($res ? '' : "not ")\, "ok\n";

In other words\, I can't construct a regex containing '\Q...\E' indirectly. Does this mean I should always use the third form of constructing the regex (using eval) rather than the qr{$string} form?

I'm not quite sure what you're trying to do. Where is the value for $ptn coming from in your actual code?

Ronald

p5pRT commented 13 years ago

From @Leont

On Tue\, Oct 19\, 2010 at 11​:33 PM\, Eric Promislow \perlbug\-followup@&#8203;perl\.org wrote​:

In other words\, I can't construct a regex containing '\Q...\E' indirectly.  Does this mean I should always use the third form of constructing the regex (using eval) rather than the qr{$string} form?

Changing $ptn from single quotes to double quotes seems to be the solution to your issue. Turns out this is even documented in perlop ("The following escape sequences are available in constructs that interpolate…")\, though I must admit I'm a bit surprised by this behavior too.

Leon

p5pRT commented 13 years ago

From ericp@ActiveState.com

On 10/20/2010 4​:39 PM\, fawaka@​gmail.com via RT wrote​:

On Tue\, Oct 19\, 2010 at 11​:33 PM\, Eric Promislow \perlbug\-followup@&#8203;perl\.org wrote​:

In other words\, I can't construct a regex containing '\Q...\E' indirectly. Does this mean I should always use the third form of constructing the regex (using eval) rather than the qr{$string} form?

Changing $ptn from single quotes to double quotes seems to be the solution to your issue. Turns out this is even documented in perlop ("The following escape sequences are available in constructs that interpolate…")\, though I must admit I'm a bit surprised by this behavior too.

I read that too\, and from my understanding\, qr{...} interpolates\, like qq{...}\, while q{...} doesn't. But I obviously misread the perldoc here.

But now I get it. \Q...\E have nothing to do with pattern-matching\, and everything to do with string interpolation. I'm out of luck (or my customers are)\, because there are no string literals in this case\, only input. And wrapping it in an \<\<eval("...")>> would be wrong\, because it could break other people's work.

Thanks.

Not a bug.

- Eric

Leon

p5pRT commented 13 years ago

From tchrist@perl.com

So I can't construct a regex from "\Q...\E" directly -- I have to put it in a variable\, and build the regex object via qr($variable).

BTW I haven't found a bug in Perl in years.... But I'm pretty sure this qualified.

Not truly. It may seem that way\, but it's all working as designed and documented. What you're asking for is an enhancement request\, not a bug fix. And I don't think we can reasonably do that at this stage in the game.

Since its genesis\, Perl has stood steadfastly in the camp of evaluating once and once only\, unless you take special measures to have it do otherwise. This was an intentional break from the multiple levels of expansion the various shells apply\, often seemingly willy-nilly and unpredictably. Perl was supposed to be predictable and WYSIWIG.

That's why there's a difference between $foo and $$foo and $$$foo\, or s/// and s///e and s///ee. You have to look rather hard to find anywhere at all that Perl applies some extra level of automatic "interpolation". I know of surprisingly few of those\, and this is not one of them. The case-translation escapes have always been documented to occur during the variable- substitution phase only\, and never outside that. They have never been something the regex compiler has understood.

From pp. 162-163 of _Programming Perl_\, 3rd edition​:

  Note that the case and metaquote translation escapes (C\<\U> and   friends) must be processed during the variable interpolation pass   because the purpose of those metasymbols is to influence how variables   are interpolated. If you suppress variable interpolation with single   quotes\, you don't get the translation escapes either. Neither   variables nor translation escapes (C\<\U>\, etc.) are expanded in any   single quoted string\, nor in single-quoted C\<m'...'> or C\<qr'...'>   operators. Even when you do interpolation\, these translation escapes   are ignored if they show up as the I\ of variable interpolation\,   since by then it's too late to influence variable interpolation.

From pp. 192-193 of _Programming Perl_\, 3rd edition​:

  The only double-quote escapes that are processed as such are the six   translation escapes​: C\<\U>\, C\<\u>\, C\<\L>\, C\<\l>\, C\<\Q>\, and C\<\E>. If   you ever look into the inner workings of the Perl regular expression   compiler\, you'll find code for handling escapes like C\<\t> for tab\,   C\<\n> for newline\, and so on. But you won't find code for those six   translation escapes. (We only listed them in A\<CHP-5-TABLE-3> because   people expect to find them there.) If you somehow manage to sneak any   of them into the pattern without going through double-quotish   evaluation\, they won't be recognized.

  How could they find their way in? Well\, you can defeat interpolation   by using single quotes as your pattern delimiter. In C\<m'...'>\,   C\<qr'...'>\, and C\<s'...'...'>\, the single quotes suppress variable   interpolation and the processing of translation escapes\, just as they   would in a single-quoted string. Saying C\<m'\ufrodo'> won't find a   capitalized version of poor frodo. However\, since the "normal"   backslash characters aren't really processed on that level anyway\,   C\<m'\t\d'> still matches a real tab followed by any digit.

  Another way to defeat interpolation is through interpolation itself.   If you say​:

  $var = '\U';   /${var}frodo/;

  poor frodo remains uncapitalized. Perl won't redo the interpolation   pass for you just because you interpolated something that looks like   it might want to be reinterpolated. You can't expect that to work any   more than you'd expect this double interpolation to work​:

  $hobbit = 'Frodo';   $var = '$hobbit'; # (single quotes)   /$var/; # means m'$hobbit'\, not m'Frodo'.

  Here's another example that shows how most backslashes are interpreted   by the regex parser\, not by variable interpolation. Imagine you have a   simple little I\-style program written in Perl​:

  #!/usr/bin/perl   $pattern = shift;   while (\<>) {   print if /$pattern/o;   }

  If you name that program I\ and call it this way​:

  % pgrep '\t\d' *.c

  then you'll find that it prints out all lines of all your C source   files in which a digit follows a tab. You didn't have to do   anything special to get Perl to realize that C\<\t> was a tab.
  If Perl's patterns I\ just double-quote interpolated\, you   would have; fortunately\, they aren't. They're recognized directly   by the regex parser.

In Java\, beginning with 1.5\, its Pattern class understands \Q ... \E within strings. You may perhaps be thinking Perl works like Java in this regard\, but as I hope the text quoted above explains\, it really does not.

If your point is that the standard documentation that Perl ships with is unclear or even misleading in this regard\, there may be some substance to that complaint. Things like perlreref.pod are one of the culprits here\, where under ESCAPE SEQUENCES for regular expressions\, they claim​:

  \l Lowercase next character   \u Titlecase next character   \L Lowercase until \E   \U Uppercase until \E   \Q Disable pattern metacharacters until \E   \E End modification

This is completely wrong. Those should not be there\, because they are *not* there. The only backslash escapes with meaning to the regular expression compiler are those operative under

  $string =~ $pattern

That you can use case-translation escapes in

  $string = qq{$pattern \Q$literal\E $more_pattern};

or

  $string =~ m{$pattern \Q$literal\E $more_pattern};

or

  $regex = qr{$pattern \Q$literal\E $more_pattern};

are artifacts of the variable-expansion phase. They do not happen in​:

  $string =~ $pattern

because the regex compiler has no earthly idea what a \Q even means.
It doesn't mean anything to it. It's an unrecognized escape.

That's why this prints "yes"​:

  print "qx" =~ m'\A\Q.\z'i ? "yes\n" : "no\n";

If you turn on warnings\, you learn

  Unrecognized escape \Q passed through in regex; marked by \<-- HERE in m/^\Q \<-- HERE .$/

And if you use C\<\< use re "debug"; >> or C\<\< -Mre=debug >>\, you'll get​:

  Compiling REx "\A\Q.\z"   Final program​:   1​: SBOL (2)   2​: EXACTF \ (4)   4​: REG_ANY (5)   5​: EOS (6)   6​: END (0)   anchored ""$ at 2 stclass EXACTF \ anchored(SBOL) minlen 2   Matching REx "\A\Q.\z" against "qx"   0 \<> \ | 1​:SBOL(2)   0 \<> \ | 2​:EXACTF \(4)   1 \ \ | 4​:REG_ANY(5)   2 \ \<> | 5​:EOS(6)   2 \ \<> | 6​:END(0)   Match successful!   yes   Freeing REx​: "\A\Q.\z"

See what's actually happening?

So this isn't a bug in Perl\, but it may be a bug in the online documentation kit. I haven't scoured those pods for other treatments of this topic beyond the one I pointed out as being misrepresented in perlreref.pod. That one should be fixed\, and anything similar hunted down and killed. Our apologies.

I hope you find the Camel text\, at least\, clear enough about all this that it makes sense to you now.

--tom

p5pRT commented 13 years ago

From @Leont

On Thu\, Oct 21\, 2010 at 1​:10 AM\, Eric Promislow \ericp@&#8203;activestate\.com wrote​:

perl process starts up\, and reads a JSON packet representing a regex to be eval'ed from a GUI.  There's one field for textInput\, one for regex\, one for operation ("match"\, "replace"\, etc.)\, one for options ("[ismx]*"). So I can't construct a regex from "\Q...\E" directly -- I have to put it in a variable\, and build the regex object via qr($variable).

Hope this makes sense.  Thanks for the quick contact.

It's still not clear to me why you want this in the first place\, or otherwise why the quotemeta builtin can't do what you want to do.

Leon

p5pRT commented 13 years ago

From ericp@ActiveState.com

On 10/20/2010 6​:06 PM\, Leon Timmermans wrote​:

On Thu\, Oct 21\, 2010 at 1​:10 AM\, Eric Promislow\ericp@&#8203;activestate\.com wrote​:

perl process starts up\, and reads a JSON packet representing a regex to be eval'ed from a GUI. There's one field for textInput\, one for regex\, one for operation ("match"\, "replace"\, etc.)\, one for options ("[ismx]*"). So I can't construct a regex from "\Q...\E" directly -- I have to put it in a variable\, and build the regex object via qr($variable).

Hope this makes sense. Thanks for the quick contact.

It's still not clear to me why you want this in the first place\, or otherwise why the quotemeta builtin can't do what you want to do.

Leon

It's for Perl mode for Komodo's Rx Toolkit​:

http​://bugs.activestate.com/show_bug.cgi?id=82715

I had always assumed that \Q...\E was for pattern-matching\, but it isn't. I stand corrected\, and hope the customer who made that request does as well.

- Eric

p5pRT commented 13 years ago

From @ikegami

On Wed\, Oct 20\, 2010 at 7​:39 PM\, Leon Timmermans \fawaka@&#8203;gmail\.com wrote​:

On Tue\, Oct 19\, 2010 at 11​:33 PM\, Eric Promislow \perlbug\-followup@&#8203;perl\.org wrote​:

In other words\, I can't construct a regex containing '\Q...\E' indirectly. Does this mean I should always use the third form of constructing the regex (using eval) rather than the qr{$string} form?

Changing $ptn from single quotes to double quotes seems to be the solution to your issue.

Not the same thing

perl -E"say qr/\Q\x30/ (?-xism​:\\x30)

perl -E"say qq/\Q\x30/ 0

p5pRT commented 13 years ago

From @Leont

On Thu\, Oct 21\, 2010 at 3​:35 AM\, Eric Promislow \ericp@&#8203;activestate\.com wrote​:

It's for Perl mode for Komodo's Rx Toolkit​:

http​://bugs.activestate.com/show_bug.cgi?id=82715

I had always assumed that \Q...\E was for pattern-matching\, but it isn't.  I stand corrected\, and hope the customer who made that request does as well.

You could emulate the parser with something like this​:

my %action = (   L => sub { lc $_[0] }\,   Q => sub { quotemeta $_[0] }\,   U => sub { uc $_[0] }\, ); s/ \\([LUQ]) (.*?) (?​:\\E|\z) / $action{$1}->($2) /xesg;

It doesn't handle overlapping \[LUG]'s\, but those should be uncommon anyway.

Leon

p5pRT commented 13 years ago

From @ikegami

On Thu\, Oct 21\, 2010 at 5​:10 AM\, Leon Timmermans \fawaka@&#8203;gmail\.com wrote​:

On Thu\, Oct 21\, 2010 at 3​:35 AM\, Eric Promislow \ericp@&#8203;activestate\.com wrote​:

It's for Perl mode for Komodo's Rx Toolkit​:

http​://bugs.activestate.com/show_bug.cgi?id=82715

I had always assumed that \Q...\E was for pattern-matching\, but it isn't. I stand corrected\, and hope the customer who made that request does as well.

You could emulate the parser with something like this​:

my %action = ( L => sub { lc $_[0] }\, Q => sub { quotemeta $_[0] }\, U => sub { uc $_[0] }\, ); s/ \\([LUQ]) (.*?) (?​:\\E|\z) / $action{$1}->($2) /xesg;

Fails if the string contains that leading "\" was escaped.

p5pRT commented 13 years ago

From matt@sergeant.org

my %action = ( L => sub { lc $_[0] }\, Q => sub { quotemeta $_[0] }\, U => sub { uc $_[0] }\, ); s/ \\([LUQ]) (.*?) (?​:\\E|\z) / $action{$1}->($2) /xesg;

Fails if the string contains that leading "\" was escaped.

That's why we invented (?\<!pattern).

p5pRT commented 13 years ago

From @ikegami

On Fri\, Oct 22\, 2010 at 3​:53 PM\, Matt Sergeant \matt@&#8203;sergeant\.org wrote​:

my %action = (> L => sub { lc $_[0] }\,> Q => sub { quotemeta $_[0] }\,> U => sub { uc $_[0] }\,> );> s/ \\([LUQ]) (.*?) (?​:\\E|\z) / $action{$1}->($2) /xesg;>

Fails if the string contains that leading "\" was escaped.

That's why we invented (?\<!pattern).

Not only did henot use (?\<!pattern)\, it doesn't help since it cannot determine whether the "Q" was preceded by an even or an odd number of slashes.

p5pRT commented 13 years ago

From @Leont

Not only did henot use (?\<!pattern)\, it doesn't help since it cannot determine whether the "Q" was preceded by an even or an odd number of slashes.

This better? ;-)

s/ (?\<!\\) (?>\\\\)* \K \\([LUQ]) (.*?) (?​:\\E|\z) / $action{$1}->($2) /xesg;

p5pRT commented 13 years ago

@rgs - Status changed from 'open' to 'rejected'