Closed p5pRT closed 20 years ago
The regular expression:
$msg =~ s/[\x00-\x1f]//g;
works to remove all control codes from an ASCII string.
The equivalent using the \cX construct does *NOT*:
$msg =~ s/[\c@-\c_]//g;
Shane Harrelson (via RT) wrote:
The regular expression:
$msg =~ s/[\x00-\x1f]//g;
works to remove all control codes from an ASCII string.
The equivalent using the \cX construct does *NOT*:
$msg =~ s/[\c@-\c_]//g;
The simplistic patch below should fix it. Any comments about it ?
--- toke.c (revision 3409) +++ toke.c (working copy) @@ -1220\,7 +1220\,7 @@
const char *leaveit = /* set of acceptably-backslashed characters */ PL_lex_inpat - ? "\\.^$@AGZdDwWsSbBpPXC+*?|()-nrtfeaxcz0123456789[{]} \t\n\r\f\v#" + ? "\\.^$@AGZdDwWsSbBpPXC+*?|()-nrtfeaxz0123456789[{]} \t\n\r\f\v#" : "";
--- t/op/pat.t (revision 3409) +++ t/op/pat.t (working copy) @@ -6\,7 +6\,7 @@
$| = 1;
-print "1..1056\n"; +print "1..1060\n";
BEGIN { chdir 't' if -d 't'; @@ -3268\,5 +3268\,10 @@ "$x-$y"; }\, 'captures can move backwards in string');
-# last test 1056 +# perl #27940: \cA not recognized in character classes +ok("a\cAb" =~ /\cA/\, '\cA in pattern'); +ok("a\cAb" =~ /[\cA]/\, '\cA in character class'); +ok("a\cAb" =~ /[\cA-\cB]/\, '\cA in character class range'); +ok("abc" =~ /[^\cA-\cB]/\, '\cA in negated character class range');
+# last test 1060
The RT System itself - Status changed from 'new' to 'open'
Rafael Garcia-Suarez \rgarciasuarez@​mandrakesoft\.com wrote: :Shane Harrelson (via RT) wrote: :> :> The regular expression: :> :> $msg =~ s/[\x00-\x1f]//g; :> :> works to remove all control codes from an ASCII string. :> :> The equivalent using the \cX construct does *NOT*: :> :> $msg =~ s/[\c@-\c_]//g; : :The simplistic patch below should fix it. :Any comments about it ?
Does this also DTRT for \cX in embedded code? Eg: /(?{ '\cA' })/ /(?{ "\cA" })/
Hugo
hv@crypt.org wrote:
Does this also DTRT for \cX in embedded code? Eg: /(?{ '\cA' })/ /(?{ "\cA" })/
Good question. The answer is yes.
I would add a few more test cases to ensure that the "range" is working... for example:
+ok("a\cBb" =~ /[\cA-\cB]/\, '\cB in character class range'); +ok("a\cCbc" =~ /[^\cA-\cB]/\, '\cC in negated character class range');
-----Original Message----- From: Rafael Garcia-Suarez via RT [mailto:perlbug-followup@perl.org] Sent: Friday\, April 02\, 2004 3:51 AM To: SHarrelson@matrasystems.com Subject: Re: [perl #27940] perlbug: [\x00-\x1f] works\, [\c@-\c_] does not
Shane Harrelson (via RT) wrote:
The regular expression:
$msg =~ s/[\x00-\x1f]//g;
works to remove all control codes from an ASCII string.
The equivalent using the \cX construct does *NOT*:
$msg =~ s/[\c@-\c_]//g;
The simplistic patch below should fix it. Any comments about it ?
\<\< snipped >>>
-# last test 1056 +# perl #27940: \cA not recognized in character classes +ok("a\cAb" =~ /\cA/\, '\cA in pattern'); +ok("a\cAb" =~ /[\cA]/\, '\cA in character class'); +ok("a\cAb" =~ /[\cA-\cB]/\, '\cA in character class range'); +ok("abc" =~ /[^\cA-\cB]/\, '\cA in negated character class range');
+# last test 1060
-- Incoming mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 3/31/2004
Rafael Garcia-Suarez wrote:
The simplistic patch below should fix it.
Which I've now applied as #22641 to bleadperl (with extended tests)
@rspier - Status changed from 'open' to 'resolved'
Change 28548 (reversion of change 22641) made [perl #27940] come back. (then it should be reopened.)
The recent perls interpolate @-\, and then /[\c@-\c_]/ must not work as /[\0-\c_]/ does. According to the current behavior\, the status of [perl #27940] should be turned from "resolved" to "rejected".
However there are some variables that are not interpolated in patterns exceptionally: namely $|\, $) and $(. If @- would not be interpolated in a pattern\, [perl #27940] could be resolved.
The attached patch makes a change that @- and @+ in patterns will not be interpolated. @+ may be an overkill\, but the symmetry between @+ and @- should take precedence over less exceptions\, since @- becomes an exception.
Actually there is a workaround of /@{-}/ even if /@-/ will not be interpolated as an array (by the change)\, as well as there is a workaround of /[\0-\c_]/ even if \c@ in /[\c@-\c_]/ is not interpreted as a metacharacter for the NULL character (currently). There are pros and cons.
The patch includes a test suite for tr///\, which is not a bug. Since tr/// doesn't interpolate variables\, tr/\c@-\c_//d works as tr/\x00-\x1f//d. It may be a pro for the change that s/[\c@-\c_]//g would work as tr/\c@-\c_//d.
attached: at-minus.patch.gz
Regards\, SADAHIRO Tomoyuki
On 24/07/06\, SADAHIRO Tomoyuki \bqw10602@​nifty\.com wrote:
The attached patch makes a change that @- and @+ in patterns will not be interpolated. @+ may be an overkill\, but the symmetry between @+ and @- should take precedence over less exceptions\, since @- becomes an exception.
Thanks\, applied as change #28620 to bleadperl. (probably not applicable to maint -- I'll add a note in perldelta for that change.)
Migrated from rt.perl.org#27940 (status was 'resolved')
Searchable as RT27940$