Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.99k stars 559 forks source link

perlbug: [\x00-\x1f] works, [\c@-\c_] does not #7198

Closed p5pRT closed 20 years ago

p5pRT commented 20 years ago

Migrated from rt.perl.org#27940 (status was 'resolved')

Searchable as RT27940$

p5pRT commented 20 years ago

From SHarrelson@matrasystems.com

Created by sharrelson@matrasystems.com

The regular expression​:

$msg =~ s/[\x00-\x1f]//g;

works to remove all control codes from an ASCII string.

The equivalent using the \cX construct does *NOT*​:

$msg =~ s/[\c@​-\c_]//g;

Perl Info ``` Flags: category=core severity=low Site configuration information for perl v5.8.0: Configured by ActiveState at Mon Mar 31 00:45:28 2003. Summary of my perl5 (revision 5 version 8 subversion 0) configuration: Platform: osname=MSWin32, osvers=4.0, archname=MSWin32-x86-multi-thread uname='' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef usethreads=undef use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cl', ccflags ='-nologo -Gf -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D_CONSOLE -DNO_STRICT -DHAVE_DES_FCRYPT -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX', optimize='-MD -Zi -DNDEBUG -O1', cppflags='-DWIN32' ccversion='', gccversion='', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -libpath:"C:\Perl\lib\CORE" -machine:x86' libpth="C:\Program Files\Mts\Lib" "C:\Perl\lib\CORE" libs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib wsock32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib perllibs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib wsock32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib libc=msvcrt.lib, so=dll, useshrplib=yes, libperl=perl58.lib gnulibc_version='undef' Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf -libpath:"C:\Perl\lib\CORE" -machine:x86' Locally applied patches: ACTIVEPERL_LOCAL_PATCHES_ENTRY @INC for perl v5.8.0: c:/Perl/lib c:/Perl/site/lib . Environment for perl v5.8.0: HOME (unset) LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=C:\WINNT\system32;C:\WINNT;C:\WINNT\system32\WBEM;c:\Perl\bin\;c:\Utili ties;C:\Program Files\PKWARE\PKZIPC PERL_BADLANG (unset) SHELL (unset) ```
p5pRT commented 20 years ago

From @rgs

Shane Harrelson (via RT) wrote​:

The regular expression​:

$msg =~ s/[\x00-\x1f]//g;

works to remove all control codes from an ASCII string.

The equivalent using the \cX construct does *NOT*​:

$msg =~ s/[\c@​-\c_]//g;

The simplistic patch below should fix it. Any comments about it ?

Index​: toke.c

--- toke.c (revision 3409) +++ toke.c (working copy) @​@​ -1220\,7 +1220\,7 @​@​

  const char *leaveit = /* set of acceptably-backslashed characters */   PL_lex_inpat - ? "\\.^$@​AGZdDwWsSbBpPXC+*?|()-nrtfeaxcz0123456789[{]} \t\n\r\f\v#" + ? "\\.^$@​AGZdDwWsSbBpPXC+*?|()-nrtfeaxz0123456789[{]} \t\n\r\f\v#"   : "";

  if (PL_lex_inwhat == OP_TRANS && PL_sublex_info.sub_op) { Index​: t/op/pat.t

--- t/op/pat.t (revision 3409) +++ t/op/pat.t (working copy) @​@​ -6\,7 +6\,7 @​@​

$| = 1;

-print "1..1056\n"; +print "1..1060\n";

BEGIN {   chdir 't' if -d 't'; @​@​ -3268\,5 +3268\,10 @​@​   "$x-$y"; }\, 'captures can move backwards in string');

-# last test 1056 +# perl #27940​: \cA not recognized in character classes +ok("a\cAb" =~ /\cA/\, '\cA in pattern'); +ok("a\cAb" =~ /[\cA]/\, '\cA in character class'); +ok("a\cAb" =~ /[\cA-\cB]/\, '\cA in character class range'); +ok("abc" =~ /[^\cA-\cB]/\, '\cA in negated character class range');

+# last test 1060

p5pRT commented 20 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 20 years ago

From @hvds

Rafael Garcia-Suarez \rgarciasuarez@​mandrakesoft\.com wrote​: :Shane Harrelson (via RT) wrote​: :> :> The regular expression​: :> :> $msg =~ s/[\x00-\x1f]//g; :> :> works to remove all control codes from an ASCII string. :> :> The equivalent using the \cX construct does *NOT*​: :> :> $msg =~ s/[\c@​-\c_]//g; : :The simplistic patch below should fix it. :Any comments about it ?

Does this also DTRT for \cX in embedded code? Eg​:   /(?{ '\cA' })/   /(?{ "\cA" })/

Hugo

p5pRT commented 20 years ago

From @rgs

hv@​crypt.org wrote​:

Does this also DTRT for \cX in embedded code? Eg​: /(?{ '\cA' })/ /(?{ "\cA" })/

Good question. The answer is yes.

p5pRT commented 20 years ago

From SHarrelson@matrasystems.com

I would add a few more test cases to ensure that the "range" is working... for example​:

+ok("a\cBb" =~ /[\cA-\cB]/\, '\cB in character class range'); +ok("a\cCbc" =~ /[^\cA-\cB]/\, '\cC in negated character class range');

-----Original Message----- From​: Rafael Garcia-Suarez via RT [mailto​:perlbug-followup@​perl.org] Sent​: Friday\, April 02\, 2004 3​:51 AM To​: SHarrelson@​matrasystems.com Subject​: Re​: [perl #27940] perlbug​: [\x00-\x1f] works\, [\c@​-\c_] does not

Shane Harrelson (via RT) wrote​:

The regular expression​:

$msg =~ s/[\x00-\x1f]//g;

works to remove all control codes from an ASCII string.

The equivalent using the \cX construct does *NOT*​:

$msg =~ s/[\c@​-\c_]//g;

The simplistic patch below should fix it. Any comments about it ?

\<\< snipped >>>

-# last test 1056 +# perl #27940​: \cA not recognized in character classes +ok("a\cAb" =~ /\cA/\, '\cA in pattern'); +ok("a\cAb" =~ /[\cA]/\, '\cA in character class'); +ok("a\cAb" =~ /[\cA-\cB]/\, '\cA in character class range'); +ok("abc" =~ /[^\cA-\cB]/\, '\cA in negated character class range');

+# last test 1060

-- Incoming mail is certified Virus Free. Checked by AVG Anti-Virus (http​://www.grisoft.com). Version​: 7.0.230 / Virus Database​: 262.6.5 - Release Date​: 3/31/2004

p5pRT commented 20 years ago

From @rgs

Rafael Garcia-Suarez wrote​:

The simplistic patch below should fix it.

Which I've now applied as #22641 to bleadperl (with extended tests)

p5pRT commented 20 years ago

@rspier - Status changed from 'open' to 'resolved'

p5pRT commented 18 years ago

From BQW10602@nifty.com

Change 28548 (reversion of change 22641) made [perl #27940] come back. (then it should be reopened.)

The recent perls interpolate @​-\, and then /[\c@​-\c_]/ must not work as /[\0-\c_]/ does. According to the current behavior\, the status of [perl #27940] should be turned from "resolved" to "rejected".

However there are some variables that are not interpolated in patterns exceptionally​: namely $|\, $) and $(. If @​- would not be interpolated in a pattern\, [perl #27940] could be resolved.

The attached patch makes a change that @​- and @​+ in patterns will not be interpolated. @​+ may be an overkill\, but the symmetry between @​+ and @​- should take precedence over less exceptions\, since @​- becomes an exception.

Actually there is a workaround of /@​{-}/ even if /@​-/ will not be interpolated as an array (by the change)\, as well as there is a workaround of /[\0-\c_]/ even if \c@​ in /[\c@​-\c_]/ is not interpreted as a metacharacter for the NULL character (currently). There are pros and cons.

The patch includes a test suite for tr///\, which is not a bug. Since tr/// doesn't interpolate variables\, tr/\c@​-\c_//d works as tr/\x00-\x1f//d. It may be a pro for the change that s/[\c@​-\c_]//g would work as tr/\c@​-\c_//d.

attached​: at-minus.patch.gz

Regards\, SADAHIRO Tomoyuki

p5pRT commented 18 years ago

From BQW10602@nifty.com

at-minus.patch.gz

p5pRT commented 18 years ago

From @rgarcia

On 24/07/06\, SADAHIRO Tomoyuki \bqw10602@&#8203;nifty\.com wrote​:

The attached patch makes a change that @​- and @​+ in patterns will not be interpolated. @​+ may be an overkill\, but the symmetry between @​+ and @​- should take precedence over less exceptions\, since @​- becomes an exception.

Thanks\, applied as change #28620 to bleadperl. (probably not applicable to maint -- I'll add a note in perldelta for that change.)