Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.91k stars 542 forks source link

Regular expression failing to match trailing digit #7076

Closed p5pRT closed 20 years ago

p5pRT commented 20 years ago

Migrated from rt.perl.org#25447 (status was 'resolved')

Searchable as RT25447$

p5pRT commented 20 years ago

From scott@pocketpurchase.com

Created by scott@pocketpurchase.com

Try this sample application​:

********************* #!/usr/bin/perl

$EmailAdrRe='[^\(\)\<>@​\,;​:\\/"\[\]\000-\040\199]+'; $EmailDomainRe='\w[\w\.\-]*\.\w+';

if($ARGV[0]=~/.*?($EmailAdrRe)\@​($EmailDomainRe)/io) {   print "OK\n"; } else {   print "Big trouble!\n"; } ********************

This fragment is a minimal test case of a bug I found in ASSP. (http​://assp.sourceforge.net)

The regular expression is supposed to check for a valid email address of the form \user@&#8203;domain\.tld. The bug occurs if the user portion ends in a digit. \a@&#8203;b\.c comes back OK. \a9a@&#8203;b\.c is fine\, too. But \a9@&#8203;b\.c fails. To make the expression work\, I had to replace $EmailAdrRe with​:

$EmailAdrRe='([0-9]|[^\(\)\<>@​\,;​:\\/"\[\]\000-\040\199])+';

By explicitly accepting digits\, the problem was resolved.

Perl Info ``` Flags: category=core severity=high Site configuration information for perl v5.8.0: Configured by bhcompile' cf_email='bhcompile at Wed Aug 13 11:45:59 EDT 2003. Summary of my rderl (revision 5.0 version 8 subversion 0) configuration: Platform: osname=linux, osvers=2.4.21-1.1931.2.382.entsmp, archname=i386-linux-thread-multi uname='linux str' config_args='-des -Doptimize=-O2 -g -pipe -march=i386 -mcpu=i686 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Dotherlibdirs=/usr/lib/perl5/5.8.0 -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef' useithreads=define usemultiplicity= useperlio= d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=un uselongdouble= usemymalloc=, bincompat5005=undef Compiler: cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm', optimize='', cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -I/usr/include/gdbm' ccversion='', gccversion='3.2.2 20030222 (Red Hat Linux 3.2.2-5)', gccosandvers='' gccversion='3.2.2 200302' intsize=r, longsize=r, ptrsize=5, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long' k', ivsize=4' ivtype='l, nvtype='double' o_nonbl', nvsize=, Off_t='', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='gcc' l', ldflags =' -L/u' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lgdbm -ldb -ldl -lm -lpthread -lc -lcrypt -lutil perllibs= libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libper gnulibc_version='2.3.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so', d_dlsymun=undef, ccdlflags='-rdynamic -Wl,-rpath,/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE' cccdlflags='-fPIC' ccdlflags='-rdynamic -Wl,-rpath,/usr/lib/perl5', lddlflags='s Unicode/Normalize XS/A' Locally applied patches: MAINT18379 @INC for perl v5.8.0: /usr/lib/perl5/5.8.0/i386-linux-thread-multi /usr/lib/perl5/5.8.0 /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.0/i386-linux-thread-multi /usr/lib/perl5/5.8.0 . Environment for perl v5.8.0: HOME=/root LANG=en_US.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/local/ssl/bin:/usr/local/apache2/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/bin/X11:/usr/X11R6/bin:/root/bin PERL_BADLANG (unset) SHELL=/bin/bash dlflags='-share (unset) ```
p5pRT commented 20 years ago

From @tamias

On Wed\, Jan 28\, 2004 at 06​:45​:14AM -0000\, scott@​pocketpurchase.com (via RT) wrote​:

$EmailAdrRe='[^\(\)\<>@​\,;​:\\/"\[\]\000-\040\199]+';

Thank you for your report. However\, this is a bug in your script\, not in perl. \199 is not a valid octal escape\, so the regex engine is parsing it as \1\, followed by two 9s. Thus\, your character class explicitly excludes the character 9. I suspect you meant \177 instead.

Ronald

p5pRT commented 20 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 20 years ago

From scott@pocketpurchase.com

Thank you for your report. However\, this is a bug in your script\, not in perl. \199 is not a valid octal escape\, so the regex engine is parsing it as \1\, followed by two 9s. Thus\, your character class explicitly excludes the character 9. I suspect you meant \177 instead.

Ah\, of course. Sorry to waste your time. Thanks for the explanation.

Scott Maxwell PocketPurchase\, Inc.

p5pRT commented 20 years ago

@rspier - Status changed from 'open' to 'resolved'

p5pRT commented 20 years ago

From @hvds

"scott@​pocketpurchase.com (via RT)" \perlbug\-followup@&#8203;perl\.org wrote​: :$EmailAdrRe='[^\(\)\<>@​\,;​:\\/"\[\]\000-\040\199]+'; [...] :But \a9@&#8203;b\.c fails.

This is because your pattern explicitly removed '9' from the list of accepted characters​: the '\nnn' notation is used to specify a character in octal\, so '\199' is interpreted the same as '\001' . '9' . '9'. I suspect you intended '\177' instead.

Note that 9 is accepted when it appears earlier in the string because of the '.*?' at the beginning of the pattern.

It is possible that perl could have been more helpful in warning you of this\, though it isn't entirely clear to me how it should detect the circumstances meriting a warning.

Hugo van der Sanden

p5pRT commented 20 years ago

From scott@pocketpurchase.com

I understand now. Thank you for the explanation.

Scott

At 07​:09 PM 2/1/2004\, you wrote​:

"scott@​pocketpurchase.com (via RT)" \perlbug\-followup@&#8203;perl\.org wrote​: ​:$EmailAdrRe='[^\(\)\<>@​\,;​:\\/"\[\]\000-\040\199]+'; [...] ​:But \a9@&#8203;b\.c fails.

This is because your pattern explicitly removed '9' from the list of accepted characters​: the '\nnn' notation is used to specify a character in octal\, so '\199' is interpreted the same as '\001' . '9' . '9'. I suspect you intended '\177' instead.

Note that 9 is accepted when it appears earlier in the string because of the '.*?' at the beginning of the pattern.

It is possible that perl could have been more helpful in warning you of this\, though it isn't entirely clear to me how it should detect the circumstances meriting a warning.

Hugo van der Sanden