Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.96k stars 555 forks source link

compiled regex behaves differently from uncompiled regex #8658

Closed p5pRT closed 18 years ago

p5pRT commented 18 years ago

Migrated from rt.perl.org#40648 (status was 'rejected')

Searchable as RT40648$

p5pRT commented 18 years ago

From sue@inf.ed.ac.uk

Created by sue@inf.ed.ac.uk

This is a short test script showing what I believe to be a bug.

=====================begin script #!/usr/bin/perl

$regex = 'aeiou'; $compiled_regex = qr/$regex/;

$string = 'ae_ai_';

#should not match either string - the final regex should fail as it should match aeiou only\, not _

if ($string =~ /^([^\_]+)\_([$regex])([$regex])([$regex])$/) {   print STDERR "PLAIN MATCHED '$&'\, '$1' and graphemes '$2'\, '$3' and '$4' \n"; }

if ($string =~ /^([^\_]+)\_([$compiled_regex])([$compiled_regex])([$compiled_regex])$/) {   print STDERR "COMPILED MATCHED '$&'\, '$1' and graphemes '$2'\, '$3' and '$4'\, using $compiled_regex\n"; } ==================end script

As the comment says\, the variable $string should not match the regexes; it correctly does not match the uncompiled version\, but incorrectly does match the compiled version\, somehow matching underscore with the compiled 'aeiou'.

Perl Info ``` Flags: category=core severity=medium This perlbug was built using Perl v5.8.8 in the Red Hat build system. It is being executed now by Perl v5.8.8 - Sun Jun 4 19:33:43 EDT 2006. Site configuration information for perl v5.8.8: Configured by Red Hat, Inc. at Sun Jun 4 19:33:43 EDT 2006. Summary of my perl5 (revision 5 version 8 subversion 8) configuration: Platform: osname=linux, osvers=2.6.9-34.elsmp, archname=i386-linux-thread-multi uname='linux hs20-bc2-4.build.redhat.com 2.6.9-34.elsmp #1 smp fri feb 24 16:56:28 est 2006 i686 i686 i386 gnulinux ' config_args='-des -Doptimize=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic -fasynchronous-unwind-tables -Dversion=5.8.8 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dinc_version_list=5.8.7 5.8.6 5.8.5 5.8.4 5.8.3 -Dscriptdir=/usr/bin' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm', optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic -fasynchronous-unwind-tables', cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/include -I/usr/include/gdbm' ccversion='', gccversion='4.1.1 20060525 (Red Hat 4.1.1-1)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='gcc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc libc=/lib/libc-2.4.so, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='2.4' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib' Locally applied patches: @INC for perl v5.8.8: /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.7/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl/5.8.7 /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl/5.8.4 /usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.7/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.6/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.7 /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl/5.8.4 /usr/lib/perl5/vendor_perl/5.8.3 /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.8/i386-linux-thread-multi /usr/lib/perl5/5.8.8 . Environment for perl v5.8.8: HOME=/home/sue LANG=en_GB.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/lib/qt-3.3/bin:/usr/kerberos/bin:/group/cstr/projects/combilex/pgsql/bin:/group/cstr/projects/combilex/Scripts:/group/contrib/nlp-speech/bin:./:/home/sue/bin/i386-linux:/sbin:/usr/sbin:/usr/local/sbin:/home/sue/bin:/home/sue/Scripts/:/home/sue/bin/share:/usr/local/bin/:/usr/X11R6/bin:/usr/local/sbin:/usr/bin:/bin:/home/sue/Combilex/POS/Taggers/ACOPOST/acopost-1.8.4/bin:/home/sue/Combilex/POS/Taggers/TreeTagger/tree-tagger-linus-3.1/cmd:/home/sue/Combilex/POS/Taggers/TreeTagger/tree-tagger-linus-3.1/bin:/opt/sicstus-3.12.5/bin PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 18 years ago

From @iabyn

On Thu\, Nov 02\, 2006 at 06​:01​:50AM -0800\, sue @​ inf. ed. ac. uk wrote​:

if ($string =~ /^([^\_]+)\_([$regex])([$regex])([$regex])$/) {

  $x = qr/.../;   /...$x..../

Does not do what I guess you think it does. It doesn't insert a call to that compiled regex at that point; instead it inserts a string representation of that compiled regex; in particular​:

  $ perl588 -we'$r = qr/aeiou/; print "[$r]\n"'   [(?-xism​:aeiou)]   $

Notice how the regex is expands to the string "(?-xism​:aeiou)". The only time that this string expansion doesn't take place is when the variable is the whole string\, as in

  $x =~ $regex or   $x =~ /$regex/

So putting a compiled regex within a character class won't do the right thing.

-- SCO - a train crash in slow motion

p5pRT commented 18 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 18 years ago

From @druud62

sue @​ inf . ed . ac . uk schreef​:

if ($string =~ /^([^\_]+)\_([$regex])([$regex])([$regex])$/)

There is no need to escape the underscores. The $regex contains 'aeiou' which isn't a regex but a string of vowel characters. Use better names.

if ($string =~ /^([^\_]+)\_([$compiled_regex])([$compiled_regex]) ([$compiled_regex])$/) { print STDERR "COMPILED MATCHED '$&'\, '$1' and graphemes '$2'\, '$3' and '$4'\, using $compiled_regex\n"; } ==================end script

[$compiled_regex] can be the same as [cdegilmoprx$_] which contains [eio_].

But in this case\, [$compiled_regex] is [(?-xism​:aeiou)]\, which contains the range "?-x"\, which contains a lot of characters.

-- Affijn\, Ruud

"Gewoon is een tijger."

p5pRT commented 18 years ago

@rgs - Status changed from 'open' to 'rejected'