Perl / perl5

๐Ÿช The Perl programming language
https://dev.perl.org/perl5/
Other
1.98k stars 558 forks source link

/(??{ "(PAT)" })/ doesn't set $1. #8148

Closed p5pRT closed 18 years ago

p5pRT commented 19 years ago

Migrated from rt.perl.org#37407 (status was 'resolved')

Searchable as RT37407$

p5pRT commented 19 years ago

From @abigail

Created by @abigail

man perlre says about (??{ code })​:

  This is a "postponed" regular subexpression.   The "code" is evaluated at run time\, at the moment this subexpression   may match. The result of evaluation is considered as a regular   expression and matched as if it were inserted instead of this   construct.

However\, the following code​:

  if ("a" =~ /(??{ "(a)" })/) {   printf "Match. \$& = '%s'. \$1 = '%s'\n" =>   map {defined $_ ? $_ : "UNDEF"} $&\, $1;   }

prints​:

  Match. $& = 'a'. $1 = 'UNDEF'

That is\, despite the parenthesis\, $1 isn't set. The documentation suggests it should.

Perl Info ``` Flags: category=core severity=medium Site configuration information for perl v5.8.7: Configured by abigail at Wed Jun 1 21:50:09 CEST 2005. Summary of my perl5 (revision 5 version 8 subversion 7) configuration: Platform: osname=linux, osvers=2.4.18-bf2.4, archname=i686-linux-64int-ld uname='linux alexandra 2.4.18-bf2.4 #1 son apr 14 09:53:28 cest 2002 i686 unknown ' config_args='-des -Dusemorebits -Uversiononly -Dmydomain=.abigail.nl -Dcf_email=abigail@abigail.nl -Dperladmin=abigail@abigail.nl -Doptimize=-g -Dcc=gcc -Dprefix=/opt/perl' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=define use64bitall=undef uselongdouble=define usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-g', cppflags='-DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include' ccversion='', gccversion='3.0.4', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long long', ivsize=8, nvtype='long double', nvsize=12, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='gcc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -ldl -lm -lcrypt -lutil -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.2.5.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.2.5' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Locally applied patches: no-syntax-warnings defined-or @INC for perl v5.8.7: /home/abigail/Perl /opt/perl/lib/5.8.7/i686-linux-64int-ld /opt/perl/lib/5.8.7 /opt/perl/lib/site_perl/5.8.7/i686-linux-64int-ld /opt/perl/lib/site_perl/5.8.7 /opt/perl/lib/site_perl/5.8.6/i686-linux-64int-ld /opt/perl/lib/site_perl/5.8.6 /opt/perl/lib/site_perl/5.8.5/i686-linux-64int-ld /opt/perl/lib/site_perl/5.8.5 /opt/perl/lib/site_perl/5.8.4/i686-linux-64int-ld /opt/perl/lib/site_perl/5.8.4 /opt/perl/lib/site_perl/5.8.3/i686-linux-64int-ld /opt/perl/lib/site_perl/5.8.3 /opt/perl/lib/site_perl/5.8.2/i686-linux-64int-ld /opt/perl/lib/site_perl/5.8.2 /opt/perl/lib/site_perl/5.8.1/i686-linux-64int-ld /opt/perl/lib/site_perl/5.8.1 /opt/perl/lib/site_perl/5.8.0/i686-linux-64int-ld /opt/perl/lib/site_perl/5.8.0 /opt/perl/lib/site_perl . Environment for perl v5.8.7: HOME=/home/abigail LANG=C LANGUAGE (unset) LD_LIBRARY_PATH=/home/abigail/Lib:/usr/local/lib:/usr/lib:/lib:/usr/X11R6/lib LOGDIR (unset) PATH=/home/abigail/Bin:/opt/perl/bin:/usr/local/bin:/usr/local/X11/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/X11R6/bin:/usr/games:/usr/share/texmf/bin:/opt/Acrobat/bin:/opt/java/blackdown/j2sdk1.3.1/bin:/usr/local/games/bin PERL5LIB=/home/abigail/Perl PERLDIR=/opt/perl PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 19 years ago

From rick@bort.ca

On Mon\, Oct 10\, 2005 at 01​:46​:34PM -0700\, abigail@​abigail.nl wrote​:

man perlre says about (??{ code })​:

This is a "postponed" regular subexpression\.
The "code" is evaluated at run time\, at the moment this subexpression
may match\. The result of evaluation is considered as a regular
expression and matched as if it were inserted instead of this
construct\.

However\, the following code​:

if \("a" =~ /\(??\{ "\(a\)" \}\)/\) \{
    printf "Match\. \\$& = '%s'\. \\$1 = '%s'\\n" =>
            map \{defined $\_ ? $\_ : "UNDEF"\} $&\, $1;
\}

prints​:

Match\. $& = 'a'\. $1 = 'UNDEF'

That is\, despite the parenthesis\, $1 isn't set. The documentation suggests it should.

I would change the documentation since this is the only way to get reusable self-contained regexps with backreferences. For example\, this fails because when dropped into the larger regex \1 refers to the wrong thing​:

  my $r = qr/(a)\1/;   "baac" =~ /(.)$r/ and print "\$1=$1; \$&=$&\n";   # prints nothing

To make it work\, one can do​:

  my $r = qr/(a)\1/;   "baac" =~ /(.)(??{ $r })/ and print "\$1=$1; \$&=$&\n";   # prints $1=b; $&=baa

It kinda sucks that you can't get the matched groups but I personally prefer having a means of localizing backreferences.

-- Rick Delaney rick@​bort.ca

p5pRT commented 19 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 19 years ago

From @rgs

Rick Delaney \rick@​bort\.ca wrote​:

I would change the documentation since this is the only way to get reusable self-contained regexps with backreferences.

Good point. I agree. (and moreover this sounds difficult to fix :) Might be a bit difficult to write down though. Doc patch suggestions anyone ?

p5pRT commented 19 years ago

From japhy@perlmonk.org

On Oct 11\, Jeff 'japhy' Pinyan said​:

my $dot = qr{ (.)(.) (?{ print "dot($1|$2)" }) }x; "japhy" =~ / (??{ $dot })/x; # dot(|) "japhy" =~ /() (??{ $dot })/x; # dot(j|) "japhy" =~ /()()(??{ $dot })/x; # dot(j|a)

The @​- and @​+ arrays are dependent in the same manner. It actually surprises me that the backreference is able to work. I was under the impression it has to do with the compilation of the top-level regex not allocating the memory for the $\ vars and whatnot.

By "the backreference is able to work"\, I mean that if we change $dot to

  qr{ (.)(.)\1\2 (?{ print "dot($1|$2)" }) }x

and change our source string to "banana"\, the regexes succeed; that is\, the backrefs \1 and \2 behave properly even though we can't see the values for $1 and $2.

-- Jeff "japhy" Pinyan % How can we ever be the sold short or RPI Acacia Brother #734 % the cheated\, we who for every service http​://www.perlmonks.org/ % have long ago been overpaid? http​://princeton.pm.org/ % -- Meister Eckhart

p5pRT commented 19 years ago

From japhy@perlmonk.org

On Oct 11\, Rick Delaney said​:

On Mon\, Oct 10\, 2005 at 01​:46​:34PM -0700\, abigail@​abigail.nl wrote​:

man perlre says about (??{ code })​:

This is a "postponed" regular subexpression\.
The "code" is evaluated at run time\, at the moment this subexpression
may match\. The result of evaluation is considered as a regular
expression and matched as if it were inserted instead of this
construct\.

However\, the following code​:

if \("a" =~ /\(??\{ "\(a\)" \}\)/\) \{
    printf "Match\. \\$& = '%s'\. \\$1 = '%s'\\n" =>
            map \{defined $\_ ? $\_ : "UNDEF"\} $&\, $1;
\}

prints​:

Match\. $& = 'a'\. $1 = 'UNDEF'

That is\, despite the parenthesis\, $1 isn't set. The documentation suggests it should.

I would change the documentation since this is the only way to get reusable self-contained regexps with backreferences. For example\, this fails because when dropped into the larger regex \1 refers to the wrong thing​:

my $r = qr/(a)\1/; "baac" =~ /(.)$r/ and print "\$1=$1; \$&=$&\n"; # prints nothing

To make it work\, one can do​:

my $r = qr/(a)\1/; "baac" =~ /(.)(??{ $r })/ and print "\$1=$1; \$&=$&\n"; # prints $1=b; $&=baa

It kinda sucks that you can't get the matched groups but I personally prefer having a means of localizing backreferences.

But what IS a bug is that\, unless the top-most regex contains $\\, no delayed-interpolation regex INSIDE that top-most regex can print its value for $\. Examples to clarify​:

  my $dot = qr{ (.)(.) (?{ print "dot($1|$2)" }) }x;   "japhy" =~ / (??{ $dot })/x; # dot(|)   "japhy" =~ /() (??{ $dot })/x; # dot(j|)   "japhy" =~ /()()(??{ $dot })/x; # dot(j|a)

The @​- and @​+ arrays are dependent in the same manner. It actually surprises me that the backreference is able to work. I was under the impression it has to do with the compilation of the top-level regex not allocating the memory for the $\ vars and whatnot.

-- Jeff "japhy" Pinyan % How can we ever be the sold short or RPI Acacia Brother #734 % the cheated\, we who for every service http​://www.perlmonks.org/ % have long ago been overpaid? http​://princeton.pm.org/ % -- Meister Eckhart

p5pRT commented 19 years ago

From @ysth

On Tue\, Oct 11\, 2005 at 02​:17​:09PM -0400\, Jeff 'japhy' Pinyan wrote​:

But what IS a bug is that\, unless the top-most regex contains $\\, no delayed-interpolation regex INSIDE that top-most regex can print its value for $\. Examples to clarify​:

my $dot = qr{ (.)(.) (?{ print "dot($1|$2)" }) }x; "japhy" =~ / (??{ $dot })/x; # dot(|) "japhy" =~ /() (??{ $dot })/x; # dot(j|) "japhy" =~ /()()(??{ $dot })/x; # dot(j|a)

The @​- and @​+ arrays are dependent in the same manner. It actually surprises me that the backreference is able to work. I was under the impression it has to do with the compilation of the top-level regex not allocating the memory for the $\ vars and whatnot.

Something queer here; this coredumps for me (both blead and maint)​:

  my $dot = qr{ () }x;   "" =~ /(??{ $dot })/x;

p5pRT commented 19 years ago

From @abigail

On Tue\, Oct 11\, 2005 at 05​:09​:36PM -0700\, Yitzchak Scott-Thoennes wrote​:

On Tue\, Oct 11\, 2005 at 02​:17​:09PM -0400\, Jeff 'japhy' Pinyan wrote​:

But what IS a bug is that\, unless the top-most regex contains $\\, no delayed-interpolation regex INSIDE that top-most regex can print its value for $\. Examples to clarify​:

my $dot = qr{ (.)(.) (?{ print "dot($1|$2)" }) }x; "japhy" =~ / (??{ $dot })/x; # dot(|) "japhy" =~ /() (??{ $dot })/x; # dot(j|) "japhy" =~ /()()(??{ $dot })/x; # dot(j|a)

The @​- and @​+ arrays are dependent in the same manner. It actually surprises me that the backreference is able to work. I was under the impression it has to do with the compilation of the top-level regex not allocating the memory for the $\ vars and whatnot.

Something queer here; this coredumps for me (both blead and maint)​:

my $dot = qr{ () }x; "" =~ /(??{ $dot })/x;

It doesn't for me in 5.8.7.

Abigail

p5pRT commented 19 years ago

From @iabyn

On Tue\, Oct 11\, 2005 at 05​:09​:36PM -0700\, Yitzchak Scott-Thoennes wrote​:

Something queer here; this coredumps for me (both blead and maint)​:

my $dot = qr{ () }x; "" =~ /(??{ $dot })/x;

Any re_eval that refers to an outer lexical may currently coredump. This should get fixed by my shiny rewite of the re_eval code. (Sheduled for completion right after Duke Nukem Forever is released.)

Dave.

-- "There's something wrong with our bloody ships today\, Chatfield."   -- Admiral Beatty at the Battle of Jutland\, 31st May 1916.

p5pRT commented 18 years ago

From shouldbedomo@mac.com

On 2005โ€“10โ€“12\, at 02​:31\, Dave Mitchell wrote​:

On Tue\, Oct 11\, 2005 at 05​:09​:36PM -0700\, Yitzchak Scott-Thoennes
wrote​:

Something queer here; this coredumps for me (both blead and maint)​:

my $dot = qr{ () }x; "" =~ /(??{ $dot })/x;

Any re_eval that refers to an outer lexical may currently coredump.
This should get fixed by my shiny rewite of the re_eval code. (Sheduled for completion right after Duke Nukem Forever is released.)

I can't make that coredump on blead@​27694. But then it doesn't
coredump for me on 5.8.6 or 5.8.8 either. In all cases\, the match
succeeds\, which is what I'd expect. Has your recent remodelling
addressed this issue (or am I just the happy user of an architecture
(Mac OS X) where perl doesn't happen to crash despite bad things
happening under the hood)?

(Even if this issue has been addressed\, abigail's original problem
still exists​:

if ("a" =~ /(??{ "(a)" })/) {   printf "Match. \$& = '%s'. \$1 = '%s'\n" =>   map {defined $_ ? $_ : "UNDEF"} $&\, $1;   }

does not set $1.) -- Dominic Dunlop

p5pRT commented 18 years ago

From @iabyn

On Mon\, Apr 03\, 2006 at 09​:57​:09AM +0200\, Dominic Dunlop wrote​:

On 2005โ€“10โ€“12\, at 02​:31\, Dave Mitchell wrote​:

On Tue\, Oct 11\, 2005 at 05​:09​:36PM -0700\, Yitzchak Scott-Thoennes
wrote​:

Something queer here; this coredumps for me (both blead and maint)​:

my $dot = qr{ () }x; "" =~ /(??{ $dot })/x;

Any re_eval that refers to an outer lexical may currently coredump.
This should get fixed by my shiny rewite of the re_eval code. (Sheduled for completion right after Duke Nukem Forever is released.)

I can't make that coredump on blead@​27694. But then it doesn't
coredump for me on 5.8.6 or 5.8.8 either. In all cases\, the match
succeeds\, which is what I'd expect. Has your recent remodelling
addressed this issue (or am I just the happy user of an architecture
(Mac OS X) where perl doesn't happen to crash despite bad things
happening under the hood)?

My recent hacks are unrelated to this issue; re_evals are still dodgy.

-- Lady Nancy Astor​: If you were my husband\, I would flavour your coffee with poison. Churchill​: Madam - if I were your husband\, I would drink it.

p5pRT commented 18 years ago

From @hvds

Dominic Dunlop \shouldbedomo@​mac\.com wrote​: [...] :(Even if this issue has been addressed\, abigail's original problem
:still exists​: : :if ("a" =~ /(??{ "(a)" })/) { : printf "Match. \$& = '%s'. \$1 = '%s'\n" => : map {defined $_ ? $_ : "UNDEF"} $&\, $1; : } : :does not set $1.)

From \<http​://perlmonks.org/index.pl?node_id=540091> (paraphrased)​:   When (??{ "(a)" }) is compiled\, this just appears as a code block in the   compiled form - and the compiled form\, among other things\, needs to know   how many capturing parens there are in the pattern. When the deferred eval   is invoked the resulting regular expression is independent of the original   one from which it was called. That means in particular that the deferred   expression has its own capture groups numbering from $1\, and these are not   available to the parent expression when it returns.

I may be wrong - I didn't check the code or docs before writing that - but as far as I know this is NOTABUG​: the parent expression has no captures\, so no captures are set by the time the parent successfully completes.

Hugo

p5pRT commented 18 years ago

From @abigail

On Mon\, Apr 03\, 2006 at 04​:54​:27AM -0700\, Hugo van der Sanden via RT wrote​:

Dominic Dunlop \shouldbedomo@&#8203;mac\.com wrote​: [...] :(Even if this issue has been addressed\, abigail's original problem
:still exists​: : :if ("a" =~ /(??{ "(a)" })/) { : printf "Match. \$& = '%s'. \$1 = '%s'\n" => : map {defined $_ ? $_ : "UNDEF"} $&\, $1; : } : :does not set $1.)

From \<http​://perlmonks.org/index.pl?node_id=540091> (paraphrased)​: When (??{ "(a)" }) is compiled\, this just appears as a code block in the compiled form - and the compiled form\, among other things\, needs to know how many capturing parens there are in the pattern. When the deferred eval is invoked the resulting regular expression is independent of the original one from which it was called. That means in particular that the deferred expression has its own capture groups numbering from $1\, and these are not available to the parent expression when it returns.

I may be wrong - I didn't check the code or docs before writing that - but as far as I know this is NOTABUG​: the parent expression has no captures\, so no captures are set by the time the parent successfully completes.

The documentation (of 5.8.8) says​:

  This is a "postponed" regular subexpression. The "code" is   evaluated at run time\, at the moment this subexpression may match.   The result of evaluation is considered as a regular expression   and matched as if it were inserted instead of this construct.

Abigail

p5pRT commented 18 years ago

From shouldbedomo@mac.com

On 2006โ€“04โ€“03\, at 20​:32\, Abigail wrote​:

On Mon\, Apr 03\, 2006 at 04​:54​:27AM -0700\, Hugo van der Sanden via
RT wrote​:

I may be wrong - I didn't check the code or docs before writing
that - but as far as I know this is NOTABUG​: the parent expression has no
captures\, so no captures are set by the time the parent successfully completes.

The documentation (of 5.8.8) says​:

  This is a "postponed" regular subexpression\. The "code" is
  evaluated at run time\, at the moment this subexpression may  

match. The result of evaluation is considered as a regular expression and matched as if it were inserted instead of this construct.

Well\, the easiest thing to do would be to fix the documentation to
match the implementation. However\, if the subexpression has its own
context to which the results of any capture are confined\, is there
any way of getting at those captures in order to export them to some
surrounding context? If so\, the docs should show it as a work-around.
(Or\, if not\, say it can't be done.)

Hugo's perlmonks wizardry referenced earlier in the thread does not
seem quite to fill the bill -- the capture surrounds the postponed
subexpression\, rather than being a part of it -- so I've tried a lot
of variations on

$ ./perl -Ilib -Mre=debug -lwe '$r = qr/(?{print $1})/; @​m = "abc"
=~ /(.)(??{"(.)$r"})(.)/; print @​m' Use of uninitialized value. \<-- seems to be an artefact of re=debug Compiling REx "(?{print $1})" size 3 Got 28 bytes for offset annotations. first at 1   1​: EVAL(3)   3​: END(0) minlen 0 with eval Offsets​: [3]   1[13] 0[0] 14[0] Compiling REx "(.)(??{"(.)$r"})(.)" size 14 Got 116 bytes for offset annotations. first at 3   1​: OPEN1(3)   3​: REG_ANY(4)   4​: CLOSE1(6)   6​: LOGICAL[2](7)   7​: EVAL(9)   9​: OPEN2(11)   11​: REG_ANY(12)   12​: CLOSE2(14)   14​: END(0) minlen 2 with eval Offsets​: [14]   1[1] 0[0] 2[1] 3[1] 0[0] 16[0] 16[0] 0[0] 17[1] 0[0] 18[1] 19 [1] 0[0] 20[0] Omitting $` $& $' support.

EXECUTING...

Matching REx "(.)(??{"(.)$r"})(.)" against "abc"   Setting an EVAL scope\, savestack=15   0 \<> \ | 1​: OPEN1   0 \<> \ | 3​: REG_ANY   1 \ \ | 4​: CLOSE1   1 \ \ | 6​: LOGICAL[2]   1 \ \ | 7​: EVAL   re_eval 0x1317440 Compiling REx "(.)(?-xism​:(?{print $1}))" size 8 Got 68 bytes for offset annotations. first at 3   1​: OPEN1(3)   3​: REG_ANY(4)   4​: CLOSE1(6)   6​: EVAL(8)   8​: END(0) minlen 1 with eval Offsets​: [8]   1[1] 0[0] 2[1] 3[1] 0[0] 12[13] 0[0] 26[0] Entering embedded "(.)(?-xism​:(?{print $1}))"   Setting an EVAL scope\, savestack=25   1 \ \ | 1​: OPEN1   1 \ \ | 3​: REG_ANY   2 \ \ | 4​: CLOSE1   2 \ \ | 6​: EVAL   re_eval 0x1317770 Bus error

Program received signal EXC_BAD_ACCESS\, Could not access memory. Reason​: KERN_PROTECTION_FAILURE at address​: 0x00000004 0x00085790 in Perl_pp_gvsv (my_perl=0x1800400) at pp_hot.c​:64 64 PUSHs(GvSVn(cGVOP_gv));

without success. Backtrace here is

#0 0x00085790 in Perl_pp_gvsv (my_perl=0x1800400) at pp_hot.c​:64 #1 0x0023d1f4 in Perl_runops_debug (my_perl=0x1800400) at dump.c​:1698 #2 0x00569f58 in S_regmatch (my_perl=0x1800400\, prog=0x1105e44) at
re_exec.c​:3281 #3 0x0056a938 in S_regmatch (my_perl=0x1800400\, prog=0x1116f94) at
re_exec.c​:3366 #4 0x00563e34 in S_regtry (my_perl=0x1800400\, prog=0x1116f50\,
startpos=0x1106640 "abc") at re_exec.c​:2202 #5 0x005627e0 in my_regexec (my_perl=0x1800400\, prog=0x1116f50\,
stringarg=0x1106640 "abc"\, strend=0x1106642 "c"\, strbeg=0x1106640
"abc"\, minend=0\, sv=0x1802150\, data=0x0\, flags=3) at re_exec.c​:2011 #6 0x00090184 in Perl_pp_match (my_perl=0x1800400) at pp_hot.c​:1373 #7 0x0023d1f4 in Perl_runops_debug (my_perl=0x1800400) at dump.c​:1698 #8 0x00043a34 in S_run_body (my_perl=0x1800400\, oldscope=1) at
perl.c​:2374 #9 0x00042d38 in perl_run (my_perl=0x1800400) at perl.c​:2294 #10 0x00002f50 in main (argc=5\, argv=0xbffff8f4\, env=0xbffff90c) at
perlmain.c​:103

and it's always the same -- apart from where one ends up in pp_hot.c
-- whatever the code in the (?{...})

The ones that don't crash invariably end up like

$ ./perl -Ilib -Mre=eval -lwe '$r = q/(?{print $1})/; @​m = "abc" =~ / (.)(??{"(.)$r"})(.)/; print @​m' Eval-group not allowed at runtime\, use re 'eval' in regex m/(.)(? {print $1})/ at -e line 1.

(Which is probably a manifestation of bug #23569 -- compiler hints
not being propagated to eval-strings.) -- Dominic Dunlop

p5pRT commented 18 years ago

From @abigail

On Mon\, Apr 03\, 2006 at 12​:54​:20PM -0700\, Dominic Dunlop via RT wrote​:

On 2006???04???03\, at 20​:32\, Abigail wrote​:

On Mon\, Apr 03\, 2006 at 04​:54​:27AM -0700\, Hugo van der Sanden via
RT wrote​:

I may be wrong - I didn't check the code or docs before writing
that - but as far as I know this is NOTABUG​: the parent expression has no
captures\, so no captures are set by the time the parent successfully completes.

The documentation (of 5.8.8) says​:

  This is a "postponed" regular subexpression\. The "code" is
  evaluated at run time\, at the moment this subexpression may  

match. The result of evaluation is considered as a regular expression and matched as if it were inserted instead of this construct.

Well\, the easiest thing to do would be to fix the documentation to
match the implementation.

I'll grant that that's the easiest thing. But is it the right thing? /(??{ })/ not being able to export parenthesis will surprise less people\, but if /(??{ })/ can export its parenthesis makes its far more powerfull. It would enable me to do things I can't do right now.

(Goal​: match IPv6 addresses and have $8 match the last set of hex digits\, regardless how many sets have been omitted with '​::').

Abigail

p5pRT commented 18 years ago

From shouldbedomo@mac.com

On 2006รข??04รข??03\, at 22​:11\, Abigail wrote​:

I'll grant that that's the easiest thing. But is it the right thing? /(??{ })/ not being able to export parenthesis will surprise less
people\, but if /(??{ })/ can export its parenthesis makes its far more
powerfull. It would enable me to do things I can't do right now.

Agreed. That's why I want to know how to export the captures. Having
it happen automatically would certainly cause the least surprise IMO\,
although changing perl to do this might conceivably break some
scripts. But then\, (??{...}) and friends have been labelled "highly
experimental" since their appearance (you can see Tom Christiansen
and Ilya trading blows on this in the archives)\, so we're covered ... -- Dominic Dunlop

p5pRT commented 18 years ago

From @iabyn

On Tue\, Apr 04\, 2006 at 09​:54​:59AM +0200\, Dominic Dunlop wrote​:

On 2006โ€“04โ€“03\, at 22​:11\, Abigail wrote​:

I'll grant that that's the easiest thing. But is it the right thing? /(??{ })/ not being able to export parenthesis will surprise less
people\, but if /(??{ })/ can export its parenthesis makes its far more
powerfull. It would enable me to do things I can't do right now.

Agreed. That's why I want to know how to export the captures. Having
it happen automatically would certainly cause the least surprise IMO\,
although changing perl to do this might conceivably break some
scripts. But then\, (??{...}) and friends have been labelled "highly
experimental" since their appearance (you can see Tom Christiansen
and Ilya trading blows on this in the archives)\, so we're covered ...

So presumably the following would change from printing "c" to printing "b"?

$ perl -le 'print $2 if "abc" =~ /(.)(??{ "(b)" })(.)/'

Although perlre.pod says that​:

  B\​: This extended regular expression feature is considered   highly experimental\, and may be changed or deleted without notice.   A simplified version of the syntax may be introduced for commonly   used idioms.

the Camel\, 3rd ed makes no such disclaimer.

I think I could fix up regmatch() to make this happen\, but I'm not entirely certain that it's right to change the existing behaviour.

One possibility would be to have two separate contructs\, one of which handles captures and one which doesn't eg

  (??{...}) existing behaviour   (??*{...}) captures

Better suggestions for new line-noise syntax welcome.

-- That he said that that that that is is is debatable\, is debatable.

p5pRT commented 18 years ago

From @hvds

Dave Mitchell \davem@&#8203;iabyn\.com wrote​: :So presumably the following would change from printing "c" to printing "b"? : :$ perl -le 'print $2 if "abc" =~ /(.)(??{ "(b)" })(.)/' [...] :I think I could fix up regmatch() to make this happen\, but I'm not :entirely certain that it's right to change the existing behaviour. : :One possibility would be to have two separate contructs\, one of which :handles captures and one which doesn't eg : : (??{...}) existing behaviour : (??*{...}) captures : :Better suggestions for new line-noise syntax welcome.

I would not be in favour of changing behaviour of existing patterns without better reasons than I have understood so far.

In general I'd expect anyone wanting this to want it throughout the regexp. So I would be inclined to use a new regexp flag for this\, with the added benefit that it marks it much more visibly. If you need to turn the effect on or off more locally you can always use /(?x-x​:...)/ inside the pattern.

It may even be useful to have it aim to emulate perl6ish nested captures\, though that may be too far away from what we can usefully do in perl5. But given a complex recursive regexp\, I think a plain list of numbered matches throws away more information than necessary about the nature of the match​: if a more faithfully representative data structure is possible\, it'd be a bonus.

Of course\, if we're constructing a new data structure without perturbing the existing numbered variables\, and if we can find a safe enough place to store the new information\, and if efficiency would not be a problem\, then maybe it doesn't need a new flag to request it after all.

Hugo

p5pRT commented 18 years ago

From @smpeters

[shouldbedomo@​mac.com - Mon Apr 03 12​:54​:19 2006]​:

On 2006โ€“04โ€“03\, at 20​:32\, Abigail wrote​:

On Mon\, Apr 03\, 2006 at 04​:54​:27AM -0700\, Hugo van der Sanden via
RT wrote​:

I may be wrong - I didn't check the code or docs before writing
that - but as far as I know this is NOTABUG​: the parent expression has no
captures\, so no captures are set by the time the parent successfully completes.

The documentation (of 5.8.8) says​:

  This is a "postponed" regular subexpression\. The "code" is
  evaluated at run time\, at the moment this subexpression may  

match. The result of evaluation is considered as a regular expression and matched as if it were inserted instead of this construct.

Well\, the easiest thing to do would be to fix the documentation to
match the implementation. However\, if the subexpression has its own
context to which the results of any capture are confined\, is there
any way of getting at those captures in order to export them to some
surrounding context? If so\, the docs should show it as a work-around.
(Or\, if not\, say it can't be done.)

Hugo's perlmonks wizardry referenced earlier in the thread does not
seem quite to fill the bill -- the capture surrounds the postponed
subexpression\, rather than being a part of it -- so I've tried a lot
of variations on

$ ./perl -Ilib -Mre=debug -lwe '$r = qr/(?{print $1})/; @​m = "abc"
=~ /(.)(??{"(.)$r"})(.)/; print @​m' Use of uninitialized value. \<-- seems to be an artefact of re=debug Compiling REx "(?{print $1})" size 3 Got 28 bytes for offset annotations. first at 1 1​: EVAL(3) 3​: END(0) minlen 0 with eval Offsets​: [3] 1[13] 0[0] 14[0] Compiling REx "(.)(??{"(.)$r"})(.)" size 14 Got 116 bytes for offset annotations. first at 3 1​: OPEN1(3) 3​: REG_ANY(4) 4​: CLOSE1(6) 6​: LOGICAL[2](7) 7​: EVAL(9) 9​: OPEN2(11) 11​: REG_ANY(12) 12​: CLOSE2(14) 14​: END(0) minlen 2 with eval Offsets​: [14] 1[1] 0[0] 2[1] 3[1] 0[0] 16[0] 16[0] 0[0] 17[1] 0[0] 18[1] 19 [1] 0[0] 20[0] Omitting $` $& $' support.

EXECUTING...

Matching REx "(.)(??{"(.)$r"})(.)" against "abc" Setting an EVAL scope\, savestack=15 0 \<> \ | 1​: OPEN1 0 \<> \ | 3​: REG_ANY 1 \ \ | 4​: CLOSE1 1 \ \ | 6​: LOGICAL[2] 1 \ \ | 7​: EVAL re_eval 0x1317440 Compiling REx "(.)(?-xism​:(?{print $1}))" size 8 Got 68 bytes for offset annotations. first at 3 1​: OPEN1(3) 3​: REG_ANY(4) 4​: CLOSE1(6) 6​: EVAL(8) 8​: END(0) minlen 1 with eval Offsets​: [8] 1[1] 0[0] 2[1] 3[1] 0[0] 12[13] 0[0] 26[0] Entering embedded "(.)(?-xism​:(?{print $1}))" Setting an EVAL scope\, savestack=25 1 \ \ | 1​: OPEN1 1 \ \ | 3​: REG_ANY 2 \ \ | 4​: CLOSE1 2 \ \ | 6​: EVAL re_eval 0x1317770 Bus error

Program received signal EXC_BAD_ACCESS\, Could not access memory. Reason​: KERN_PROTECTION_FAILURE at address​: 0x00000004 0x00085790 in Perl_pp_gvsv (my_perl=0x1800400) at pp_hot.c​:64 64 PUSHs(GvSVn(cGVOP_gv));

without success. Backtrace here is

#0 0x00085790 in Perl_pp_gvsv (my_perl=0x1800400) at pp_hot.c​:64 #1 0x0023d1f4 in Perl_runops_debug (my_perl=0x1800400) at dump.c​:1698 #2 0x00569f58 in S_regmatch (my_perl=0x1800400\, prog=0x1105e44) at
re_exec.c​:3281 #3 0x0056a938 in S_regmatch (my_perl=0x1800400\, prog=0x1116f94) at
re_exec.c​:3366 #4 0x00563e34 in S_regtry (my_perl=0x1800400\, prog=0x1116f50\,
startpos=0x1106640 "abc") at re_exec.c​:2202 #5 0x005627e0 in my_regexec (my_perl=0x1800400\, prog=0x1116f50\,
stringarg=0x1106640 "abc"\, strend=0x1106642 "c"\, strbeg=0x1106640
"abc"\, minend=0\, sv=0x1802150\, data=0x0\, flags=3) at re_exec.c​:2011 #6 0x00090184 in Perl_pp_match (my_perl=0x1800400) at pp_hot.c​:1373 #7 0x0023d1f4 in Perl_runops_debug (my_perl=0x1800400) at dump.c​:1698 #8 0x00043a34 in S_run_body (my_perl=0x1800400\, oldscope=1) at
perl.c​:2374 #9 0x00042d38 in perl_run (my_perl=0x1800400) at perl.c​:2294 #10 0x00002f50 in main (argc=5\, argv=0xbffff8f4\, env=0xbffff90c) at
perlmain.c​:103

and it's always the same -- apart from where one ends up in pp_hot.c
-- whatever the code in the (?{...})

The ones that don't crash invariably end up like

$ ./perl -Ilib -Mre=eval -lwe '$r = q/(?{print $1})/; @​m = "abc" =~ / (.)(??{"(.)$r"})(.)/; print @​m' Eval-group not allowed at runtime\, use re 'eval' in regex m/(.)(? {print $1})/ at -e line 1.

(Which is probably a manifestation of bug #23569 -- compiler hints
not being propagated to eval-strings.)

This appears to have been fixed with the more recent recursion removals.

steve@​kirk​:\~/perl-current$ perl -lwe '$r = qr/(?{print $1})/; @​m = "abc" =~ /(.)(??{"(.)$r"})(.)/; print @​m' Segmentation fault steve@​kirk​:\~/perl-current$ ./perl -Ilib -lwe '$r = qr/(?{print $1})/; @​m = "abc" =~ /(.)(??{"(.)$r"})(.)/; print @​m' b ac

p5pRT commented 18 years ago

@smpeters - Status changed from 'open' to 'resolved'