Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.91k stars 542 forks source link

regex (?<name>...) capture-to-var paren, new $^N magic variable #4142

Closed p5pRT closed 21 years ago

p5pRT commented 23 years ago

Migrated from rt.perl.org#7190 (status was 'resolved')

Searchable as RT7190$

p5pRT commented 23 years ago

From jfriedl@yahoo-inc.com

SUMMARY​:   1) Gee\, it'd be nice to support (?\...) "named capture" parens   within regular expressions.

  2) I created a new magic variable $^N\, similar to $+.

  3) If this is of interest\, is $^N a good name?

Feeling envy for highly advanced :-) languages like Java and Visual Basic\, whose regex languages allows named captures a'la

  (?\\d+)

(sets $myval to the digits captured)\, I thought I'd use regex overloading to convert this syntaxt to

  (\d+) (?{ $varname = $+ })

(capture to normal parens\, then use $+ to access it and assign to the variable)

This works fine except for a few details\, such as being able to have capturing parens nested within\, since $+ refers to the *hightest-numbered* set of parens used so far\, not the most-recently *closed* set of parens used so far.

So\, I created a new magic variable\, $^N\, that pretty much parallels $+ except it does indeed refer to the most-recently *closed* set of parens.

Now\, converting to

  (\d+) (?{ $varname = $^N })

works even with nesting. This is very nice.

I can send the patches if wanted. But if wanted\, what is a good variable name? I picked $^N simply because I saw it was free.

For those interested\, I've appended my package to allow this\, and a short test program.

There are still "issues" with my overloading -- the variables named are not checked at runtime ('use strict' doesn't save you from referring to a nonexistant variable)\, and the variables are not "protected"\, so even if during a match a variable is set\, it won't be unset if the match later fails.

So\, it really would be nice if named captures were officially supported. I spent some hours digging through regcomp and regexec\, and succeeded only in killing massive amounts of neurons )-​:\, so I don't think I'll be able to add it.

  Jeffrey

Here is a short test script​:

---snip------------------------------------------------------------------   use strict;   use warnings;   use Regex​::SupportNamedCapture;

  my $areacode; ## both lexical   our $exchange; ## and global   my $number; ## variables work fine.

  "My number is 408-555-1212." =~ m{   \b   (?\\d\d\d)   -   (?\\d\d\d)   -   (?\\d\d\d\d \b)   }x;

  print "phone number is​: ($areacode) $exchange-$number\n";

  my $fullnumber;

  "My number is 408-555-1212." =~ m{   \b   (?\   (?\\d\d\d)   -   (?\\d\d\d)   -   (?\\d\d\d\d \b)   )   }x;

  if ($fullnumber ne "$areacode-$exchange-$number")   {   print "Bummer\, you don't have \$^N support​: fullnumber is [$fullnumber]\n";   } ---snip------------------------------------------------------------------

Here is the package​:

---snip------------------------------------------------------------------ package Regex​::SupportNamedCapture;

## ## This package allows regular expressions to have named captures\, a'la ## ## (?\...) ## sets $varname to the result of what's matched by the /.../ ## ## If your Perl supports the $^N "most-recently-closed-paren text" variable\, ## the /.../ part may itself contain capturing parens. Otherwise\, it ## shouldn't. ## ## jfriedl@​yahoo.com ## 6/2001 ## use strict; use warnings; use re 'eval'; use overload; sub import { overload​::constant 'qr' => \&convert }

## ## Test to see if my proposed $^N is supported. ## Set $GutsResult to $^N if so\, set to $+ if not. ## my $GutsResult = do {   no warnings;   "1" =~ m/(1)/;   if ($^N) {   '$^N';   } else {   '$+';   } };

our $OpenParens; ## needed for matching nested parens

my $NestedParenGuts = qr{

  (?{ local $OpenParens = 0 }) ## counts the number of nested opens waiting to close

  (?>   (?>   ## stuff not parens\, not escaped   [^()\\]+

  ## escaped stuff   | (?s​: \\. )

  # another opening paren   | \( (?{ $OpenParens++ })

  # a closing paren\, if we're expecting any   | (?(?{ $OpenParens }) (?{ $OpenParens-- }) \))   )*   ) }x;

## ## Mimics named capturing parens by converting something like ## (?\...) ## to ## (?​: (...) (?{ $varname = $^N }) ) ## ## sub convert {   my $re = shift; ## regex to mangle

  $re =~ s{   (?\<! \\ ) # an unescaped...   \(\? # "(?"   \< # '\<'   (\w+) # $1 - an identifier   > # '>'   ($NestedParenGuts) # $2 - regex guts   \) # ')'   }{   my $id = '$' . $1;   my $guts = convert($2);   "(?​:($guts)(?{ $id=$GutsResult }))";   }exg;

  return $re; ## return mangled regex }

1; ---snip------------------------------------------------------------------

Perl Info ``` Flags: category=core severity=wishlist Site configuration information for perl v5.7.1: Configured by jfriedl at Sun Jun 17 23:29:39 PDT 2001. Summary of my perl5 (revision 5.0 version 7 subversion 17) configuration: Platform: osname=linux, osvers=2.4.5, archname=i686-linux uname='linux fummy.telocity.com 2.4.5 #3 smp mon jun 4 22:43:14 pdt 2001 i686 unknown ' config_args='-e -s -O -D optimize=-O2 -g' hint=previous, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef Compiler: cc='cc', ccflags ='-Wall -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -g', cppflags='-Wall -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -Wall -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' ccversion='', gccversion='2.95.3 20010315 (release)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, usemymalloc=n, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt -lutil perllibs=-lnsl -ldl -lm -lc -lposix -lcrypt -lutil libc=/lib/libc-2.2.3.so, so=so, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Locally applied patches: DEVEL10654 @INC for perl v5.7.1: /home/jfriedl/lib/perl /home/jfriedl/lib/perl/yahoo /usr/local/lib/perl5/5.7.1/i686-linux /usr/local/lib/perl5/5.7.1 /usr/local/lib/perl5/site_perl/5.7.1/i686-linux /usr/local/lib/perl5/site_perl/5.7.1 /usr/local/lib/perl5/site_perl/5.6.1/i686-linux /usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl/5.6.0/i686-linux /usr/local/lib/perl5/site_perl/5.6.0 /usr/local/lib/perl5/site_perl . Environment for perl v5.7.1: HOME=/home/jfriedl LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH=/usr/local/pgsql/lib:/home/jfriedl/src/rvplayer5.0 LOGDIR (unset) PATH=/home/jfriedl/bin:/home/jfriedl/common/bin:.:/usr/local/pgsql/bin:/usr/local/bin:/usr/X11R6/bin:/bin:/usr/bin:/usr/sbin:/sbin:/home/jfriedl/src/rvplayer5.0:/usr/local/prod/bin:/usr/local/java/bin PERLLIB=/home/jfriedl/lib/perl:/home/jfriedl/lib/perl/yahoo PERL_BADLANG (unset) SHELL=/bin/tcsh ```
p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

Well\, since there has been no objection\, here is the patch.   Jeffrey


Inline Patch ```diff diff -u -r .orig/embedvar.h ./embedvar.h --- .orig/embedvar.h Wed Jun 20 11:35:50 2001 +++ ./embedvar.h Tue Jun 26 12:16:35 2001 @@ -113,6 +113,7 @@ #define PL_regint_start (vTHX->Tregint_start) #define PL_regint_string (vTHX->Tregint_string) #define PL_reginterp_cnt (vTHX->Treginterp_cnt) +#define PL_reglastcloseparen (vTHX->Treglastcloseparen) #define PL_reglastparen (vTHX->Treglastparen) #define PL_regnarrate (vTHX->Tregnarrate) #define PL_regnaughty (vTHX->Tregnaughty) @@ -821,6 +822,7 @@ #define PL_regint_start (aTHXo->interp.Tregint_start) #define PL_regint_string (aTHXo->interp.Tregint_string) #define PL_reginterp_cnt (aTHXo->interp.Treginterp_cnt) +#define PL_reglastcloseparen (aTHXo->interp.Treglastcloseparen) #define PL_reglastparen (aTHXo->interp.Treglastparen) #define PL_regnarrate (aTHXo->interp.Tregnarrate) #define PL_regnaughty (aTHXo->interp.Tregnaughty) @@ -1518,6 +1520,7 @@ #define PL_regint_start (aTHX->Tregint_start) #define PL_regint_string (aTHX->Tregint_string) #define PL_reginterp_cnt (aTHX->Treginterp_cnt) +#define PL_reglastcloseparen (aTHX->Treglastcloseparen) #define PL_reglastparen (aTHX->Treglastparen) #define PL_regnarrate (aTHX->Tregnarrate) #define PL_regnaughty (aTHX->Tregnaughty) @@ -1654,6 +1657,7 @@ #define PL_Tregint_start PL_regint_start #define PL_Tregint_string PL_regint_string #define PL_Treginterp_cnt PL_reginterp_cnt +#define PL_Treglastcloseparen PL_reglastcloseparen #define PL_Treglastparen PL_reglastparen #define PL_Tregnarrate PL_regnarrate #define PL_Tregnaughty PL_regnaughty diff -u -r .orig/gv.c ./gv.c --- .orig/gv.c Mon Jun 25 08:08:15 2001 +++ ./gv.c Tue Jun 26 12:35:10 2001 @@ -895,6 +895,7 @@ case '\006': /* $^F */ case '\010': /* $^H */ case '\011': /* $^I, NOT \t in EBCDIC */ + case '\016': /* $^N */ case '\020': /* $^P */ case '\024': /* $^T */ if (len > 1) @@ -1764,6 +1765,7 @@ case '\010': /* $^H */ case '\011': /* $^I, NOT \t in EBCDIC */ case '\014': /* $^L */ + case '\016': /* $^N */ case '\020': /* $^P */ case '\023': /* $^S */ case '\024': /* $^T */ diff -u -r .orig/mg.c ./mg.c --- .orig/mg.c Mon Jun 25 07:48:03 2001 +++ ./mg.c Tue Jun 26 12:36:16 2001 @@ -435,6 +435,13 @@ goto getparen; } return 0; + case '\016': /* ^N */ + if (PL_curpm && (rx = PM_GETRE(PL_curpm))) { + paren = rx->lastcloseparen; + if (paren) + goto getparen; + } + return 0; case '`': if (PL_curpm && (rx = PM_GETRE(PL_curpm))) { if (rx->startp[0] != -1) { @@ -655,6 +662,14 @@ case '+': if (PL_curpm && (rx = PM_GETRE(PL_curpm))) { paren = rx->lastparen; + if (paren) + goto getparen; + } + sv_setsv(sv,&PL_sv_undef); + break; + case '\016': /* ^N */ + if (PL_curpm && (rx = PM_GETRE(PL_curpm))) { + paren = rx->lastcloseparen; if (paren) goto getparen; } diff -u -r .orig/perlapi.h ./perlapi.h --- .orig/perlapi.h Wed Jun 20 11:35:50 2001 +++ ./perlapi.h Tue Jun 26 12:16:35 2001 @@ -802,6 +802,8 @@ #define PL_regint_string (*Perl_Tregint_string_ptr(aTHXo)) #undef PL_reginterp_cnt #define PL_reginterp_cnt (*Perl_Treginterp_cnt_ptr(aTHXo)) +#undef PL_reglastcloseparen +#define PL_reglastcloseparen (*Perl_Treglastcloseparen_ptr(aTHXo)) #undef PL_reglastparen #define PL_reglastparen (*Perl_Treglastparen_ptr(aTHXo)) #undef PL_regnarrate diff -u -r .orig/regexec.c ./regexec.c --- .orig/regexec.c Thu Jun 21 07:16:49 2001 +++ ./regexec.c Tue Jun 26 12:28:14 2001 @@ -147,7 +147,7 @@ if (paren_elems_to_push < 0) Perl_croak(aTHX_ "panic: paren_elems_to_push < 0"); -#define REGCP_OTHER_ELEMS 5 +#define REGCP_OTHER_ELEMS 6 SSCHECK(paren_elems_to_push + REGCP_OTHER_ELEMS); for (p = PL_regsize; p > parenfloor; p--) { /* REGCP_PARENS_ELEMS are pushed per pairs of parentheses. */ @@ -159,6 +159,7 @@ /* REGCP_OTHER_ELEMS are pushed in any case, parentheses or no. */ SSPUSHINT(PL_regsize); SSPUSHINT(*PL_reglastparen); + SSPUSHINT(*PL_reglastcloseparen); SSPUSHPTR(PL_reginput); #define REGCP_FRAME_ELEMS 2 /* REGCP_FRAME_ELEMS are part of the REGCP_OTHER_ELEMS and @@ -192,6 +193,7 @@ assert(i == SAVEt_REGCONTEXT); /* Check that the magic cookie is there. */ i = SSPOPINT; /* Parentheses elements to pop. */ input = (char *) SSPOPPTR; + *PL_reglastcloseparen = SSPOPINT; *PL_reglastparen = SSPOPINT; PL_regsize = SSPOPINT; @@ -1871,6 +1873,7 @@ PL_regstartp = prog->startp; PL_regendp = prog->endp; PL_reglastparen = &prog->lastparen; + PL_reglastcloseparen = &prog->lastcloseparen; prog->lastparen = 0; PL_regsize = 0; DEBUG_r(PL_reg_starttry = startpos); @@ -2562,6 +2565,7 @@ cache_re(re); state.ss = PL_savestack_ix; *PL_reglastparen = 0; + *PL_reglastcloseparen = 0; PL_reg_call_cc = &state; PL_reginput = locinput; @@ -2619,6 +2623,7 @@ PL_regendp[n] = locinput - PL_bostr; if (n > *PL_reglastparen) *PL_reglastparen = n; + *PL_reglastcloseparen = n; break; case GROUPP: n = ARG(scan); /* which paren pair */ diff -u -r .orig/regexp.h ./regexp.h --- .orig/regexp.h Sun Apr 22 09:12:37 2001 +++ ./regexp.h Tue Jun 26 12:10:53 2001 @@ -37,6 +37,7 @@ I32 prelen; /* length of precomp */ U32 nparens; /* number of parentheses */ U32 lastparen; /* last paren matched */ + U32 lastcloseparen; /* last paren matched */ U32 reganch; /* Internal use only + Tainted information used by regexec? */ regnode program[1]; /* Unwarranted chumminess with compiler. */ diff -u -r .orig/thrdvar.h ./thrdvar.h --- .orig/thrdvar.h Mon Apr 30 05:29:37 2001 +++ ./thrdvar.h Tue Jun 26 12:12:52 2001 @@ -182,6 +182,7 @@ PERLVAR(Tregstartp, I32 *) /* Pointer to startp array. */ PERLVAR(Tregendp, I32 *) /* Ditto for endp. */ PERLVAR(Treglastparen, U32 *) /* Similarly for lastparen. */ +PERLVAR(Treglastcloseparen, U32 *) /* Similarly for lastcloseparen. */ PERLVAR(Tregtill, char *) /* How far we are required to go. */ PERLVAR(Tregcompat1, char) /* used to be regprev1 */ PERLVAR(Treg_start_tmp, char **) /* from regexec.c */ ```
p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

Inline Patch ```diff diff -ru pod/.orig/perlretut.pod pod/perlretut.pod --- pod/.orig/perlretut.pod Tue Jun 12 18:39:57 2001 +++ pod/perlretut.pod Fri Jun 29 23:51:13 2001 @@ -710,9 +710,12 @@ /(ab(cd|ef)((gi)|j))/; 1 2 34 -so that if the regexp matched, e.g., C<$2> would contain 'cd' or 'ef'. -For convenience, perl sets C<$+> to the highest numbered C<$1>, C<$2>, -... that got assigned. +so that if the regexp matched, e.g., C<$2> would contain 'cd' or 'ef'. For +convenience, perl sets C<$+> to the string held by the highest numbered +C<$1>, C<$2>, ... that got assigned (and, somewhat related, C<$^N> to the +value of the C<$1>, C<$2>, ... most-recently assigned; i.e. the C<$1>, +C<$2>, ... associated with the rightmost closing parenthesis used in the +match). Closely associated with the matching variables C<$1>, C<$2>, ... are the B C<\1>, C<\2>, ... . Backreferences are simply diff -ru pod/.orig/perltoc.pod pod/perltoc.pod --- pod/.orig/perltoc.pod Tue Jun 26 08:00:08 2001 +++ pod/perltoc.pod Fri Jun 29 23:45:32 2001 @@ -904,7 +904,7 @@ $EVAL_ERROR, $@, $PROCESS_ID, $PID, $$, $REAL_USER_ID, $UID, $<, $EFFECTIVE_USER_ID, $EUID, $>, $REAL_GROUP_ID, $GID, $(, $EFFECTIVE_GROUP_ID, $EGID, $), $PROGRAM_NAME, $0, $[, $], $COMPILING, $^C, -$DEBUGGING, $^D, $SYSTEM_FD_MAX, $^F, $^H, %^H, $INPLACE_EDIT, $^I, $^M, +$DEBUGGING, $^D, $SYSTEM_FD_MAX, $^F, $^H, %^H, $INPLACE_EDIT, $^I, $^M, $^N, $OSNAME, $^O, $PERLDB, $^P, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x100, 0x200, $LAST_REGEXP_CODE_RESULT, $^R, $EXCEPTIONS_BEING_CAUGHT, $^S, $BASETIME, $^T, $PERL_VERSION, $^V, $WARNING, $^W, ${^WARNING_BITS}, diff -ru pod/.orig/perlvar.pod pod/perlvar.pod --- pod/.orig/perlvar.pod Mon Jun 18 07:42:24 2001 +++ pod/perlvar.pod Sat Jun 30 00:04:05 2001 @@ -180,14 +180,29 @@ =item $+ -The last bracket matched by the last search pattern. This is useful if -you don't know which one of a set of alternative patterns matched. For -example: +The text matched by the last bracket of the last successful search pattern. +This is useful if you don't know which one of a set of alternative patterns +matched. For example: /Version: (.*)|Revision: (.*)/ && ($rev = $+); (Mnemonic: be positive and forward looking.) This variable is read-only and dynamically scoped to the current BLOCK. + +=item $^N + +The text matched by the used group most-recently closed (i.e. the group +with the rightmost closing parenthesis) of the last successful search +pattern. This is primarly used inside C<(?{...})> blocks for examining text +recently matched. For example, to effectively capture text to a variable +(in addition to C<$1>, C<$2>, etc.), replace C<(...)> with + + (?:(...)(?{ $var = $^N })) + +By setting and then using C<$var> in this way relieves you from having to +worry about exactly which numbered set of parentheses they are. + +This variable is dynamically scoped to the current BLOCK. =item @LAST_MATCH_END --- t/op/.orig/pat.t Thu Jun 28 20:17:51 2001 +++ t/op/pat.t Sat Jun 30 00:52:21 2001 @@ -6,7 +6,7 @@ $| = 1; -print "1..639\n"; +print "1..660\n"; BEGIN { chdir 't' if -d 't'; @@ -1854,3 +1854,38 @@ print "not " unless " " =~ /[[:print:]]/; print "ok 639\n"; +## +## Test basic $^N usage outside of a regex +## +$x = "abcdef"; +$T="ok 640\n";if ($x =~ /cde/ and not defined $^N) {print $T} else {print "not $T"}; +$T="ok 641\n";if ($x =~ /(cde)/ and $^N eq "cde") {print $T} else {print "not $T"}; +$T="ok 642\n";if ($x =~ /(c)(d)(e)/ and $^N eq "e") {print $T} else {print "not $T"}; +$T="ok 643\n";if ($x =~ /(c(d)e)/ and $^N eq "cde") {print $T} else {print "not $T"}; +$T="ok 644\n";if ($x =~ /(foo)|(c(d)e)/ and $^N eq "cde") {print $T} else {print "not $T"}; +$T="ok 645\n";if ($x =~ /(c(d)e)|(foo)/ and $^N eq "cde") {print $T} else {print "not $T"}; +$T="ok 646\n";if ($x =~ /(c(d)e)|(abc)/ and $^N eq "abc") {print $T} else {print "not $T"}; +$T="ok 647\n";if ($x =~ /(c(d)e)|(abc)x/ and $^N eq "cde") {print $T} else {print "not $T"}; +$T="ok 648\n";if ($x =~ /(c(d)e)(abc)?/ and $^N eq "cde") {print $T} else {print "not $T"}; +$T="ok 649\n";if ($x =~ /(?:c(d)e)/ and $^N eq "d" ) {print $T} else {print "not $T"}; +$T="ok 650\n";if ($x =~ /(?:c(d)e)(?:f)/ and $^N eq "d" ) {print $T} else {print "not $T"}; +$T="ok 651\n";if ($x =~ /(?:([abc])|([def]))*/ and $^N eq "f" ){print $T} else {print "not $T"}; +$T="ok 652\n";if ($x =~ /(?:([ace])|([bdf]))*/ and $^N eq "f" ){print $T} else {print "not $T"}; +$T="ok 653\n";if ($x =~ /(([ace])|([bd]))*/ and $^N eq "e" ){print $T} else {print "not $T"}; +{ + $T="ok 654\n";if($x =~ /(([ace])|([bdf]))*/ and $^N eq "f" ){print $T} else {print "not $T"}; +} +## test to see if $^N is automatically localized -- it should now +## have the value set in test 653 +$T="ok 655\n";if ($^N eq "e" ){print $T} else {print "not $T"}; + +## +## Now test inside (?{...}) +## +$T="ok 656\n";if ($x =~ /a([abc])(?{$y=$^N})c/ and $y eq "b" ){print $T} else {print "not $T"}; +$T="ok 657\n";if ($x =~ /a([abc]+)(?{$y=$^N})d/ and $y eq "bc"){print $T} else {print "not $T"}; +$T="ok 658\n";if ($x =~ /a([abcdefg]+)(?{$y=$^N})d/ and $y eq "bc"){print $T} else {print "not $T"}; +$T="ok 659\n";if ($x =~ /(a([abcdefg]+)(?{$y=$^N})d)(?{$z=$^N})e/ and $y eq "bc" and $z eq "abcd") + {print $T} else {print "not $T"}; +$T="ok 660\n";if ($x =~ /(a([abcdefg]+)(?{$y=$^N})de)(?{$z=$^N})/ and $y eq "bc" and $z eq "abcde") + {print $T} else {print "not $T"}; ```
p5pRT commented 23 years ago

From @jhi

Thanks\, applied. (base/lex.t need tweaking\, too\, since it assumed $^N is an unused variable...)

-----------------------------------------------------------------

diff -u -r .orig/embedvar.h ./embedvar.h --- .orig/embedvar.h Wed Jun 20 11​:35​:50 2001 +++ ./embedvar.h Tue Jun 26 12​:16​:35 2001 @​@​ -113\,6 +113\,7 @​@​ #define PL_regint_start (vTHX->Tregint_start) #define PL_regint_string (vTHX->Tregint_string) #define PL_reginterp_cnt (vTHX->Treginterp_cnt) +#define PL_reglastcloseparen (vTHX->Treglastcloseparen) #define PL_reglastparen (vTHX->Treglastparen) #define PL_regnarrate (vTHX->Tregnarrate) #define PL_regnaughty (vTHX->Tregnaughty) @​@​ -821\,6 +822\,7 @​@​ #define PL_regint_start (aTHXo->interp.Tregint_start) #define PL_regint_string (aTHXo->interp.Tregint_string) #define PL_reginterp_cnt (aTHXo->interp.Treginterp_cnt) +#define PL_reglastcloseparen (aTHXo->interp.Treglastcloseparen) #define PL_reglastparen (aTHXo->interp.Treglastparen) #define PL_regnarrate (aTHXo->interp.Tregnarrate) #define PL_regnaughty (aTHXo->interp.Tregnaughty) @​@​ -1518\,6 +1520\,7 @​@​ #define PL_regint_start (aTHX->Tregint_start) #define PL_regint_string (aTHX->Tregint_string) #define PL_reginterp_cnt (aTHX->Treginterp_cnt) +#define PL_reglastcloseparen (aTHX->Treglastcloseparen) #define PL_reglastparen (aTHX->Treglastparen) #define PL_regnarrate (aTHX->Tregnarrate) #define PL_regnaughty (aTHX->Tregnaughty) @​@​ -1654\,6 +1657\,7 @​@​ #define PL_Tregint_start PL_regint_start #define PL_Tregint_string PL_regint_string #define PL_Treginterp_cnt PL_reginterp_cnt +#define PL_Treglastcloseparen PL_reglastcloseparen #define PL_Treglastparen PL_reglastparen #define PL_Tregnarrate PL_regnarrate #define PL_Tregnaughty PL_regnaughty diff -u -r .orig/gv.c ./gv.c --- .orig/gv.c Mon Jun 25 08​:08​:15 2001 +++ ./gv.c Tue Jun 26 12​:35​:10 2001 @​@​ -895\,6 +895\,7 @​@​ case '\006'​: /* $^F */ case '\010'​: /* $^H */ case '\011'​: /* $^I\, NOT \t in EBCDIC */ + case '\016'​: /* $^N */ case '\020'​: /* $^P */ case '\024'​: /* $^T */ if (len > 1) @​@​ -1764\,6 +1765\,7 @​@​ case '\010'​: /* $^H */ case '\011'​: /* $^I\, NOT \t in EBCDIC */ case '\014'​: /* $^L */ + case '\016'​: /* $^N */ case '\020'​: /* $^P */ case '\023'​: /* $^S */ case '\024'​: /* $^T */ diff -u -r .orig/mg.c ./mg.c --- .orig/mg.c Mon Jun 25 07​:48​:03 2001 +++ ./mg.c Tue Jun 26 12​:36​:16 2001 @​@​ -435\,6 +435\,13 @​@​ goto getparen; } return 0; + case '\016'​: /* ^N */ + if (PL_curpm && (rx = PM_GETRE(PL_curpm))) { + paren = rx->lastcloseparen; + if (paren) + goto getparen; + } + return 0; case '`'​: if (PL_curpm && (rx = PM_GETRE(PL_curpm))) { if (rx->startp[0] != -1) { @​@​ -655\,6 +662\,14 @​@​ case '+'​: if (PL_curpm && (rx = PM_GETRE(PL_curpm))) { paren = rx->lastparen; + if (paren) + goto getparen; + } + sv_setsv(sv\,&PL_sv_undef); + break; + case '\016'​: /* ^N */ + if (PL_curpm && (rx = PM_GETRE(PL_curpm))) { + paren = rx->lastcloseparen; if (paren) goto getparen; } diff -u -r .orig/perlapi.h ./perlapi.h --- .orig/perlapi.h Wed Jun 20 11​:35​:50 2001 +++ ./perlapi.h Tue Jun 26 12​:16​:35 2001 @​@​ -802\,6 +802\,8 @​@​ #define PL_regint_string (*Perl_Tregint_string_ptr(aTHXo)) #undef PL_reginterp_cnt #define PL_reginterp_cnt (*Perl_Treginterp_cnt_ptr(aTHXo)) +#undef PL_reglastcloseparen +#define PL_reglastcloseparen (*Perl_Treglastcloseparen_ptr(aTHXo)) #undef PL_reglastparen #define PL_reglastparen (*Perl_Treglastparen_ptr(aTHXo)) #undef PL_regnarrate diff -u -r .orig/regexec.c ./regexec.c --- .orig/regexec.c Thu Jun 21 07​:16​:49 2001 +++ ./regexec.c Tue Jun 26 12​:28​:14 2001 @​@​ -147\,7 +147\,7 @​@​ if (paren_elems_to_push \< 0) Perl_croak(aTHX_ "panic​: paren_elems_to_push \< 0");

-#define REGCP_OTHER_ELEMS 5 +#define REGCP_OTHER_ELEMS 6 SSCHECK(paren_elems_to_push + REGCP_OTHER_ELEMS); for (p = PL_regsize; p > parenfloor; p--) { /* REGCP_PARENS_ELEMS are pushed per pairs of parentheses. */ @​@​ -159\,6 +159\,7 @​@​ /* REGCP_OTHER_ELEMS are pushed in any case\, parentheses or no. */ SSPUSHINT(PL_regsize); SSPUSHINT(*PL_reglastparen); + SSPUSHINT(*PL_reglastcloseparen); SSPUSHPTR(PL_reginput); #define REGCP_FRAME_ELEMS 2 /* REGCP_FRAME_ELEMS are part of the REGCP_OTHER_ELEMS and @​@​ -192\,6 +193\,7 @​@​ assert(i == SAVEt_REGCONTEXT); /* Check that the magic cookie is there. */ i = SSPOPINT; /* Parentheses elements to pop. */ input = (char *) SSPOPPTR; + *PL_reglastcloseparen = SSPOPINT; *PL_reglastparen = SSPOPINT; PL_regsize = SSPOPINT;

@​@​ -1871\,6 +1873\,7 @​@​ PL_regstartp = prog->startp; PL_regendp = prog->endp; PL_reglastparen = &prog->lastparen; + PL_reglastcloseparen = &prog->lastcloseparen; prog->lastparen = 0; PL_regsize = 0; DEBUG_r(PL_reg_starttry = startpos); @​@​ -2562\,6 +2565\,7 @​@​ cache_re(re); state.ss = PL_savestack_ix; *PL_reglastparen = 0; + *PL_reglastcloseparen = 0; PL_reg_call_cc = &state; PL_reginput = locinput;

@​@​ -2619\,6 +2623\,7 @​@​ PL_regendp[n] = locinput - PL_bostr; if (n > *PL_reglastparen) *PL_reglastparen = n; + *PL_reglastcloseparen = n; break; case GROUPP​: n = ARG(scan); /* which paren pair */ diff -u -r .orig/regexp.h ./regexp.h --- .orig/regexp.h Sun Apr 22 09​:12​:37 2001 +++ ./regexp.h Tue Jun 26 12​:10​:53 2001 @​@​ -37\,6 +37\,7 @​@​ I32 prelen; /* length of precomp */ U32 nparens; /* number of parentheses */ U32 lastparen; /* last paren matched */ + U32 lastcloseparen; /* last paren matched */ U32 reganch; /* Internal use only + Tainted information used by regexec? */ regnode program[1]; /* Unwarranted chumminess with compiler. */ diff -u -r .orig/thrdvar.h ./thrdvar.h --- .orig/thrdvar.h Mon Apr 30 05​:29​:37 2001 +++ ./thrdvar.h Tue Jun 26 12​:12​:52 2001 @​@​ -182\,6 +182\,7 @​@​ PERLVAR(Tregstartp\, I32 *) /* Pointer to startp array. */ PERLVAR(Tregendp\, I32 *) /* Ditto for endp. */ PERLVAR(Treglastparen\, U32 *) /* Similarly for lastparen. */ +PERLVAR(Treglastcloseparen\, U32 *) /* Similarly for lastcloseparen. */ PERLVAR(Tregtill\, char *) /* How far we are required to go. */ PERLVAR(Tregcompat1\, char) /* used to be regprev1 */ PERLVAR(Treg_start_tmp\, char **) /* from regexec.c */

p5pRT commented 23 years ago

From @jhi

Thanks\, applied.

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

On Jun 30\, Jeffrey Friedl said​:

+so that if the regexp matched\, e.g.\, C\<$2> would contain 'cd' or 'ef'. For +convenience\, perl sets C\<$+> to the string held by the highest numbered +C\<$1>\, C\<$2>\, ... that got assigned (and\, somewhat related\, C\<$^N> to the +value of the C\<$1>\, C\<$2>\, ... most-recently assigned; i.e. the C\<$1>\, +C\<$2>\, ... associated with the rightmost closing parenthesis used in the +match).

Does that make $^N equal to

  substr(???\, $-[-1]\, $+[-1] - $-[-1]);

If so\, you might want to include that in the docs for @​- and/or @​+.

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

On Jul 26\, Jeff 'japhy/Marillion' Pinyan said​:

On Jun 30\, Jeffrey Friedl said​:

+so that if the regexp matched\, e.g.\, C\<$2> would contain 'cd' or 'ef'. For +convenience\, perl sets C\<$+> to the string held by the highest numbered +C\<$1>\, C\<$2>\, ... that got assigned (and\, somewhat related\, C\<$^N> to the +value of the C\<$1>\, C\<$2>\, ... most-recently assigned; i.e. the C\<$1>\, +C\<$2>\, ... associated with the rightmost closing parenthesis used in the +match).

Does that make $^N equal to

substr(???\, $-[-1]\, $+[-1] - $-[-1]);

Hmm\, that should be substr(???\, $-[-1]\, $+[$#-] - $-[-1]). Just in case my logic with $^N is wrong\, I've not included it in the following patch. Oh\, speaking of the following patch...

I just found gross misinformation in perlvar.pod. Patch after sig.

-- Jeff "japhy" Pinyan japhy@​pobox.com http​://www.pobox.com/~japhy/ I am Marillion\, the wielder of Ringril\, known as Hesinaur\, the Winter-Sun. Are you a Monk? http​://www.perlmonks.com/ http​://forums.perlguru.com/ Perl Programmer at RiskMetrics Group\, Inc. http​://www.riskmetrics.com/ Acacia Fraternity\, Rensselaer Chapter. Brother #734 ** Manning Publications\, Co\, is publishing my Perl Regex book **

Inline Patch ```diff --- pod/perlvar.pod.old Thu Jul 26 19:58:37 2001 +++ pod/perlvar.pod Thu Jul 26 20:00:11 2001 @@ -472,10 +472,8 @@ successful submatches in the currently active dynamic scope. C<$-[0]> is the offset into the string of the beginning of the entire match. The Ith element of this array holds the offset -of the Ith submatch, so C<$+[1]> is the offset where $1 -begins, C<$+[2]> the offset where $2 begins, and so on. -You can use C<$#-> to determine how many subgroups were in the -last successful match. Compare with the C<@+> variable. +of the Ith submatch, so C<$-[1]> is the offset where $1 +begins, C<$-[2]> the offset where $2 begins, and so on. After a match against some variable $var: ```
p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

01-07-27 01.55\, skrev Jeff 'japhy/Marillion' Pinyan pÃ¥ jeffp@​crusoe.net följande​:

If so\, you might want to include that in the docs for @​- and/or @​+.

Aren't @​- and @​+ deprecated?

-- Arthur

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

On Jul 27\, Arthur Bergman said​:

01-07-27 01.55\, skrev Jeff 'japhy/Marillion' Pinyan på jeffp@​crusoe.net följande​:

If so\, you might want to include that in the docs for @​- and/or @​+.

Aren't @​- and @​+ deprecated?

Whoa. When did that happen? They just got here. I'd think they're totally UN-deprecated. You can get at $` and $& and $' by using the offsets they hold.

-- Jeff "japhy" Pinyan japhy@​pobox.com http​://www.pobox.com/~japhy/ I am Marillion\, the wielder of Ringril\, known as Hesinaur\, the Winter-Sun. Are you a Monk? http​://www.perlmonks.com/ http​://forums.perlguru.com/ Perl Programmer at RiskMetrics Group\, Inc. http​://www.riskmetrics.com/ Acacia Fraternity\, Rensselaer Chapter. Brother #734 ** Manning Publications\, Co\, is publishing my Perl Regex book **

p5pRT commented 23 years ago

From @jhi

Thanks\, applied.

p5pRT commented 21 years ago

@cwest - Status changed from 'open' to 'resolved'