Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

Problem with conditional regexps and ^, $ #1794

Closed p5pRT closed 20 years ago

p5pRT commented 24 years ago

Migrated from rt.perl.org#3052 (status was 'resolved')

Searchable as RT3052$

p5pRT commented 24 years ago

From kvale@phy.ucsf.EDU

This is a bug report for perl from kvale@​ivy.ucsf.edu\, generated with the help of perlbug 1.28 running under perl v5.6.0.

I have been playing with conditional regexps in perl 5.6.0 and noticed the following strange behavior​:

ivy 156% cat test2.pl $c = "bob"; print "no anchors matches\n" if $c =~ /b(?(1)ob)/; print "^ anchor matches\n" if $c =~ /^b(?(1)ob)/; print "\$ anchor matches\n" if $c =~ /b(?(1)ob)$/; print "^\$ anchor matches\n" if $c =~ /^b(?(1)ob)$/;

ivy 157% perl test2.pl no anchors matches ^ anchor matches $ anchor matches

The regexp C\< b(?(1)ob) > should match the string C\< "bob" > exactly\, so any combination of C\< ^$ > anchors should match as well. The fully anchored regexp C\< ^b(?(1)ob)$ > does not match\, however\, but should.

  -Mark


Flags​:   category=core   severity=medium


Site configuration information for perl v5.6.0​:

Configured by kvale at Thu Mar 23 11​:46​:38 PST 2000.

Summary of my perl5 (revision 5.0 version 6 subversion 0) configuration​:   Platform​:   osname=dec_osf\, osvers=4.0\, archname=alpha-dec_osf   uname='osf1 ivy.ucsf.edu v4.0 878 alpha '   config_args=''   hint=previous\, useposix=true\, d_sigaction=define   usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef   useperlio=undef d_sfio=undef uselargefiles=define   use64bitint=define use64bitall=define uselongdouble=undef usesocks=undef   Compiler​:   cc='cc'\, optimize='-O4'\, gccversion=   cppflags='-std -ieee -D_INTRINSICS -I/usr/local/include -DLANGUAGE_C'   ccflags ='-std -fprm d -ieee -D_INTRINSICS -I/usr/local/include -DLANGUAGE_C'   stdchar='unsigned char'\, d_stdstdio=define\, usevfork=false   intsize=4\, longsize=8\, ptrsize=8\, doublesize=8   d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=8   ivtype='long'\, ivsize=8\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=8   alignbytes=8\, usemymalloc=y\, prototype=define   Linker and Libraries​:   ld='ld'\, ldflags =' -L/usr/local/lib'   libpth=/usr/local/lib /usr/shlib /usr/ccs/lib /usr/lib/cmplrs/cc /usr/lib /var/shlib   libs=-lgdbm -ldbm -ldb -lm -liconv   libc=/usr/shlib/libc.so\, so=so\, useshrplib=false\, libperl=libperl.a   Dynamic Linking​:   dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags=' -Wl\,-rpath\,/home/kvale/lib/perl5/5.6.0/alpha-dec_osf/CORE'   cccdlflags=' '\, lddlflags='-shared -expect_unresolved "*" -msym -std -s -L/usr/local/lib'

Locally applied patches​:  


@​INC for perl v5.6.0​:   /home/kvale/lib/perl5/5.6.0/alpha-dec_osf   /home/kvale/lib/perl5/5.6.0   /home/kvale/lib/perl5/site_perl/5.6.0/alpha-dec_osf   /home/kvale/lib/perl5/site_perl/5.6.0   /home/kvale/lib/perl5/site_perl   .


Environment for perl v5.6.0​:   HOME=/home/kvale   LANG=   LANGUAGE (unset)   LD_LIBRARY_PATH=/home/kvale/lib​:/home/kvale/gnu/db/msql/lib​:/home/kvale/gnu/postgres/lib​:/usr/lib​:/usr/lib/X11​:/usr/local/lib​:/usr/local/visix/galaxy/lib   LOGDIR (unset)   PATH=.​:/home/kvale/bin​:/home/kvale/gnu/xmgr/grace/bin​:/home/kvale/tex/teTeX/bin​:/home/kvale/math/neuron/nrn/bin​:/usr/local/bin​:/usr/bin​:/bin​:/usr/ucb​:/usr/bin/X11​:/usr/bin/mh​:/usr/bin/mme​:/usr/dt/bin​:/usr/local/bin/mbone​:/usr/local/bin/mtools​:/usr/local/visix/galaxy/bin​:/usr/local/games​:/usr/local/java/bin​:/usr/local/entropic-5.1/bin​:/usr/local/rsi/idl_4/bin​:/usr/local/uimx2.9/bin​:/usr/local/xview/bin​:/home/kvale/graphics/kho/bin   PERL_BADLANG (unset)   SHELL=/usr/local/bin/tcsh

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Mark Kvale wrote​:

The regexp C\< b(?(1)ob) > should match the string C\< "bob" > exactly\, so any combination of C\< ^$ > anchors should match as well. The fully anchored regexp C\< ^b(?(1)ob)$ > does not match\, however\, but should.

No\, you aren't capturing $1 for (?(1)ob) to ever come into play.

  print "ok 1\n" if "bob" =~ /^b/;   print "ok 2\n" if "bob" =~ /b$/;   print "ok 3\n" if "bob" !~ /^b$/;   print "ok 4\n" if "bob" =~ /^(b)(?(1)ob)$/;   print "ok 5\n" if "bob" =~ /^b(?(1)ob)ob$/;   print "ok 6\n" if "bob" =~ /^b(?(1)etty)ob$/;

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Apr 6\, Rick Delaney said​:

Mark Kvale wrote​:

The regexp C\< b(?(1)ob) > should match the string C\< "bob" > exactly\, so any combination of C\< ^$ > anchors should match as well. The fully anchored regexp C\< ^b(?(1)ob)$ > does not match\, however\, but should.

No\, you aren't capturing $1 for (?(1)ob) to ever come into play.

He probably misread the section of perlre.pod talking about what 'condition' in (?(condition)yes-pattern|no-pattern) can be.

  Conditional expression. (condition) should be   either an integer in parentheses (which is valid   if the corresponding pair of parentheses matched)\,   or look-ahead/look-behind/evaluate zero-width   assertion.

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

No\, you aren't capturing $1 for (?(1)ob) to ever come into play.

Oops\, thanks for the correction. I misinterpreted the integer in the integer form the of conditional expression as an integer\, not the number of a backreference. Sloppy reading and testing on my part.

perlre could be more explicit in stating that the integer represents a backreference. Instead of

  Conditional expression. C\<(condition)> should be either an integer in   parentheses (which is valid if the corresponding pair of parentheses   matched)\, or look-ahead/look-behind/evaluate zero-width assertion.

how about something like

  Conditional expression. C\<(condition)> can occur in one of two forms.   The first form is an integer in parentheses\, C\<(integer)>\, with the   integer representing the backreference C\<\integer>. If the corresponding   pair of parentheses matched\, so that C\<\integer> is defined\, the   condition evaluates to true. The second form is a bare assertion\,   C\\, where the assertion is either a lookahead\, lookbehind\,   or code evaluation zero-width assertion.

Time to revise the tutorial\,

  -Mark