Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.94k stars 554 forks source link

Keeping track of 'Unescaped left brace in regex is deprecated' #12137

Closed p5pRT closed 11 years ago

p5pRT commented 12 years ago

Migrated from rt.perl.org#113094 (status was 'resolved')

Searchable as RT113094$

p5pRT commented 12 years ago

From @andk

This ticket could serve as a meeting point for authors affected. I'll write individual tickets to the authors and link to this ticket for further information.

The commit itself describes the change very well\, so there may be nothing to add. I think it boils down that the use of a '{' in a regexp needs to be replaced with a '\{'.

The commit itself


http​://perl5.git.perl.org/perl.git/commit/2a53d3314d380af5ab5283758219417c6dfa36e9

The first fail reports


DRTECH/HTML-StripScripts-1.05.tar.gz http​://www.cpantesters.org/cpan/report/44b2867a-a69c-11e1-ad06-f004f4b14d39

HMBRAND/Data-Peek-0.37.tgz

HMBRAND/Spreadsheet-Read-0.46.tgz

JAK/File-ANVL-1.04.tar.gz http​://www.cpantesters.org/cpan/report/8d6ccd90-a6b4-11e1-8cad-34edf3b14d39

JSTEBENS/POE-Component-Server-REST-1.11.tar.gz

JUSTER/WWW-AUR-0.14.tar.gz

KJETILK/RDF-Trine-Node-Literal-XML-0.16.tar.gz

KRYDE/Perl-Critic-Pulp-70.tar.gz

KRYDE/distlinks-5.tar.gz http​://www.cpantesters.org/cpan/report/a1288590-a6d7-11e1-a26b-8e4cf4b14d39

MAROS/Business-UPS-Tracking-1.09.tar.gz http​://www.cpantesters.org/cpan/report/e5f091ea-a646-11e1-bb55-de44f4b14d39

MONS/XML-RPC-Fast-0.8.tar.gz http​://www.cpantesters.org/cpan/report/557692fc-a634-11e1-9aa5-411ff4b14d39

SDPRICE/App-Framework-Lite-1.08.tar.gz http​://www.cpantesters.org/cpan/report/2bbe4ca0-a6b3-11e1-bdb5-314cf4b14d39

WARRINGD/Elive-1.26.tar.gz

perl -V


Summary of my perl5 (revision 5 version 17 subversion 0) configuration​:
  Commit id​: 2a53d3314d380af5ab5283758219417c6dfa36e9   Platform​:   osname=linux\, osvers=3.2.0-2-amd64\, archname=x86_64-linux-ld   uname='linux k83 3.2.0-2-amd64 #1 smp mon apr 30 05​:20​:23 utc 2012 x86_64 gnulinux '   config_args='-Dprefix=/home/src/perl/repoperls/installed-perls/perl/v5.16.0-225-g2a53d33/127e -Dmyhostname=k83 -Dinstallusrbinperl=n -Uversiononly -Dusedevel -des -Ui_db -Uuseithreads -Duselongdouble -DDEBUGGING=-g'   hint=recommended\, useposix=true\, d_sigaction=define   useithreads=undef\, usemultiplicity=undef   useperlio=define\, d_sfio=undef\, uselargefiles=define\, usesocks=undef   use64bitint=define\, use64bitall=define\, uselongdouble=define   usemymalloc=n\, bincompat5005=undef   Compiler​:   cc='cc'\, ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'\,   optimize='-O2 -g'\,   cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'   ccversion=''\, gccversion='4.6.3'\, gccosandvers=''   intsize=4\, longsize=8\, ptrsize=8\, doublesize=8\, byteorder=12345678   d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=16   ivtype='long'\, ivsize=8\, nvtype='long double'\, nvsize=16\, Off_t='off_t'\, lseeksize=8   alignbytes=16\, prototype=define   Linker and Libraries​:   ld='cc'\, ldflags =' -fstack-protector -L/usr/local/lib'   libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib   libs=-lnsl -ldb -ldl -lm -lcrypt -lutil -lc   perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc   libc=\, so=so\, useshrplib=false\, libperl=libperl.a   gnulibc_version='2.13'   Dynamic Linking​:   dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-Wl\,-E'   cccdlflags='-fPIC'\, lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector'

Characteristics of this binary (from libperl)​:   Compile-time options​: HAS_TIMES PERLIO_LAYERS PERL_DONT_CREATE_GVSV   PERL_MALLOC_WRAP PERL_PRESERVE_IVUV PERL_USE_DEVEL   USE_64_BIT_ALL USE_64_BIT_INT USE_LARGE_FILES   USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE   USE_LOCALE_NUMERIC USE_LONG_DOUBLE USE_PERLIO   USE_PERL_ATOF   Built under linux   Compiled at May 25 2012 07​:02​:38   @​INC​:   /home/src/perl/repoperls/installed-perls/perl/v5.16.0-225-g2a53d33/127e/lib/site_perl/5.17.0/x86_64-linux-ld   /home/src/perl/repoperls/installed-perls/perl/v5.16.0-225-g2a53d33/127e/lib/site_perl/5.17.0   /home/src/perl/repoperls/installed-perls/perl/v5.16.0-225-g2a53d33/127e/lib/5.17.0/x86_64-linux-ld   /home/src/perl/repoperls/installed-perls/perl/v5.16.0-225-g2a53d33/127e/lib/5.17.0   .

-- andreas

p5pRT commented 12 years ago

From @khwilliamson

On 05/26/2012 02​:59 AM\, (Andreas J. Koenig) (via RT) wrote​:

The first fail reports ---------------------- DRTECH/HTML-StripScripts-1.05.tar.gz http​://www.cpantesters.org/cpan/report/44b2867a-a69c-11e1-ad06-f004f4b14d39

This new deprecation message appears to have exposed a real bug in this code. It looks like a missing "}" to me\, which silently caused a would-be-quantifier to be treated as a literal.

# Failed test 'use HTML​::StripScripts;' # at t/10basic.t line 7. # Tried to use 'HTML​::StripScripts'. # Error​: Unescaped left brace in regex is deprecated\, passed through in regex; marked by \<-- HERE in m/^\s*([+-]?\d{1\,20}(?​:\.\d{ \<-- HERE 1\,20)?)\s*((?​:\%|\*|ex|px|pc|cm|mm|in|pt|em)?)\s*$/ at /tmp/loop_over_bdir-4PCTbR/HTML-StripScripts-1.05-Id9VGu/blib/lib/HTML/StripScripts.pm line 1633. # Compilation failed in require at (eval 4) line 2.

p5pRT commented 12 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 12 years ago

From @clintongormley

This new deprecation message appears to have exposed a real bug in this code. It looks like a missing "}" to me\, which silently caused a would-be-quantifier to be treated as a literal.

... which embarrassingly wasn't being tested either.

thanks!

clint

p5pRT commented 12 years ago

From @andk

Found in GRANTM/XML-Simple-2.18.tar.gz in lib/XML/Simple.pm​:

  995 $val =~ s{\$\{([\w.]+)\}}{ $self->get_var($1) }ge;   1031 $val =~ s{\$\{(\w+)\}}{ $self->get_var($1) }ge;

% make test [...] t/0_Config.t .. Unescaped left brace in regex is deprecated\, passed through in regex; marked by \<-- HERE in m/\${ \<-- HERE ([\w.]+)}/ at /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-cUi3UY/blib/lib/XML/Simple.pm line 995. Unescaped left brace in regex is deprecated\, passed through in regex; marked by \<-- HERE in m/\${ \<-- HERE (\w+)}/ at /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-cUi3UY/blib/lib/XML/Simple.pm line 1031. # Package Version # perl 5.17.0 # XML​::Simple 2.18 # Storable 2.35 # XML​::Parser 2.41 # XML​::SAX 0.99 # XML​::NamespaceSupport 1.11 t/0_Config.t .. ok

Look like perl miscounts one backslash. I would expect that the regexp is accepted by perl because the brace is escaped.

-- andreas

p5pRT commented 12 years ago

From @cpansprout

On Mon May 28 00​:51​:00 2012\, andreas.koenig.7os6VVqR@​franz.ak.mind.de wrote​:

Found in GRANTM/XML-Simple-2.18.tar.gz in lib/XML/Simple.pm​:

995       $val =~ s\{\\$\\\{\(\[\\w\.\]\+\)\\\}\}\{ $self\->get\_var\($1\) \}ge;

1031 $val =~ s{\$\{(\w+)\}}{ $self->get_var($1) }ge;

% make test [...] t/0_Config.t .. Unescaped left brace in regex is deprecated\, passed through in regex; marked by \<-- HERE in m/\${ \<-- HERE ([\w.]+)}/ at /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18- cUi3UY/blib/lib/XML/Simple.pm line 995. Unescaped left brace in regex is deprecated\, passed through in regex; marked by \<-- HERE in m/\${ \<-- HERE (\w+)}/ at /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-cUi3UY/blib/lib/XML/Simple.pm line 1031. # Package Version # perl 5.17.0 # XML​::Simple 2.18 # Storable 2.35 # XML​::Parser 2.41 # XML​::SAX 0.99 # XML​::NamespaceSupport 1.11 t/0_Config.t .. ok

Look like perl miscounts one backslash. I would expect that the regexp is accepted by perl because the brace is escaped.

I think this pretty much ends this deprecation. Too many people use {} as delimiters.

What’s happening above is that delimiter escapes are removed before that pattern reaches the regular expression engine.

The same thing happens with m.\..\, which is equivalent to /./\, not /\./.

If one is using {} delimiters\, then there is no way to match a literal { or } without doing something like [{] or [}].

On the other hand\, m{ a\{1\,2\} }x doesn’t do what most people think it does.

--

Father Chrysostomos

p5pRT commented 12 years ago

From @demerphq

On 28 May 2012 17​:45\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Mon May 28 00​:51​:00 2012\, andreas.koenig.7os6VVqR@​franz.ak.mind.de wrote​:

Found in GRANTM/XML-Simple-2.18.tar.gz in lib/XML/Simple.pm​:

    995       $val =~ s{\$\{([\w.]+)\}}{ $self->get_var($1) }ge;    1031         $val =~ s{\$\{(\w+)\}}{ $self->get_var($1) }ge;

% make test [...] t/0_Config.t .. Unescaped left brace in regex is deprecated\, passed    through in regex; marked by \<-- HERE in m/\${ \<-- HERE ([\w.]+)}/    at /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-    cUi3UY/blib/lib/XML/Simple.pm line 995. Unescaped left brace in regex is deprecated\, passed through in regex;    marked by \<-- HERE in m/\${ \<-- HERE (\w+)}/ at    /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-cUi3UY/blib/lib/XML/Simple.pm    line 1031. # Package                        Version #  perl                           5.17.0 #  XML​::Simple                    2.18 #  Storable                       2.35 #  XML​::Parser                    2.41 #  XML​::SAX                       0.99 #  XML​::NamespaceSupport          1.11 t/0_Config.t .. ok

Look like perl miscounts one backslash. I would expect that the regexp is accepted by perl because the brace is escaped.

I think this pretty much ends this deprecation.  Too many people use {} as delimiters.

Well\, we could fix how this is handled. But one wonders if its worth it\, outside of a larger effort anyway.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From zefram@fysh.org

Father Chrysostomos via RT wrote​:

I think this pretty much ends this deprecation. Too many people use {} as delimiters.

Not at all. It's an especially confusing case\, especially deserving of the clarification wrought by the deprecation.

The same thing happens with m.\..\, which is equivalent to /./\, not /\./.

Maybe there should be a warning for every use of a backslashed metacharacter in this manner. It's too easy to be mistaken about m.\.. and its ilk.

-zefram

p5pRT commented 12 years ago

From @hvds

andreas.koenig.7os6VVqR@​franz.ak.mind.de (Andreas J. Koenig) wrote​: :Found in GRANTM/XML-Simple-2.18.tar.gz in lib/XML/Simple.pm​: : : 995 $val =~ s{\$\{([\w.]+)\}}{ $self->get_var($1) }ge; : 1031 $val =~ s{\$\{(\w+)\}}{ $self->get_var($1) }ge; : :% make test :[...] :t/0_Config.t .. Unescaped left brace in regex is deprecated\, passed through in regex; marked by \<-- HERE in m/\${ \<-- HERE ([\w.]+)}/ at /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-cUi3UY/blib/lib/XML/Simple.pm line 995. :Unescaped left brace in regex is deprecated\, passed through in regex; marked by \<-- HERE in m/\${ \<-- HERE (\w+)}/ at /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-cUi3UY/blib/lib/XML/Simple.pm line 1031. :# Package Version :# perl 5.17.0 :# XML​::Simple 2.18 :# Storable 2.35 :# XML​::Parser 2.41 :# XML​::SAX 0.99 :# XML​::NamespaceSupport 1.11 :t/0_Config.t .. ok
: : :Look like perl miscounts one backslash. I would expect that the regexp :is accepted by perl because the brace is escaped.

IIRC\, when you use a regex metacharacter as a delimiter\, escaping it gives you the metacharacter​:

% perl -wle 'print "line\n" =~ m$\$$' 1 %

So I think the warning is correct\, though somewhat misleadingly expressed.

I don't remember if there is a way to get past that to match the literal.

Hugo

p5pRT commented 12 years ago

From @demerphq

On 28 May 2012 20​:31\, Zefram \zefram@&#8203;fysh\.org wrote​:

Father Chrysostomos via RT wrote​:

I think this pretty much ends this deprecation.  Too many people use {} as delimiters.

Not at all.  It's an especially confusing case\, especially deserving of the clarification wrought by the deprecation.

The alternative is to stop the tokenizer from doing this type of unescaping on regex patterns. There is no need to do it\, and it breaks stuff. Seems like a good reason to stop doing something.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From @ikegami

On Mon\, May 28\, 2012 at 11​:45 AM\, Father Chrysostomos via RT \< perlbug-followup@​perl.org> wrote​:

The same thing happens with m.\..\, which is equivalent to /./\, not /\./.

Wow. That's unexpected! I would call it a bug even.

p5pRT commented 12 years ago

From vadim.konovalov@alcatel-lucent.com

On Mon\, May 28\, 2012 at 11​:45 AM\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org\<mailto&#8203;:perlbug\-followup@&#8203;perl\.org> wrote​: The same thing happens with m.\..\, which is equivalent to /./\, not /\./.

Wow. That's unexpected! I would call it a bug even.

not a bug.

p5pRT commented 12 years ago

From vadim.konovalov@alcatel-lucent.com

From​: Zefram [mailto​:zefram@​fysh.org] Father Chrysostomos via RT wrote​:

I think this pretty much ends this deprecation. Too many people use {} as delimiters.

Not at all. It's an especially confusing case\, especially deserving of the clarification wrought by the deprecation.

wow. that's tough.

I use s{}{}ge often\, and then escape any { and } by \\, which happens to be just fine with me.

AFAIR I saw this somewhere in perl documentation and got used to it.

So - there are many constructs of type -   s{} {   if (somthing) \{ \} else \{ \} }ge;

Please do not deprecate this. Having "{}" delimeters is fun and symmetrical.

The same thing happens with m.\..\, which is equivalent to /./\, not /\./.

This is fine\, and meets my expectations.

Maybe there should be a warning for every use of a backslashed metacharacter in this manner. It's too easy to be mistaken about m.\.. and its ilk.

please don't

p5pRT commented 12 years ago

From @ikegami

On Tue\, May 29\, 2012 at 12​:49 AM\, Konovalov\, Vadim (Vadim)** CTR ** \< vadim.konovalov@​alcatel-lucent.com> wrote​:

**

On Mon\, May 28\, 2012 at 11​:45 AM\, Father Chrysostomos via RT \< perlbug-followup@​perl.org> wrote​:

The same thing happens with m.\..\, which is equivalent to /./\, not /\./.

Wow. That's unexpected! I would call it a bug even.

not a bug.

The escape doesn't escape! How is that not a bug?

p5pRT commented 12 years ago

From @demerphq

On 29 May 2012 07​:17\, Eric Brine \ikegami@&#8203;adaelis\.com wrote​:

On Tue\, May 29\, 2012 at 12​:49 AM\, Konovalov\, Vadim (Vadim)** CTR ** \vadim\.konovalov@&#8203;alcatel\-lucent\.com wrote​:

On Mon\, May 28\, 2012 at 11​:45 AM\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

The same thing happens with m.\..\, which is equivalent to /./\, not /\./.

Wow. That's unexpected! I would call it a bug even.

 not a bug.

The escape doesn't escape! How is that not a bug?

Because it is documented to happen.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From vadim.konovalov@alcatel-lucent.com

From​: demerphq On 29 May 2012 07​:17\, Eric Brine wrote​:

On Mon\, May 28\, 2012 at 11​:45 AM\, Father Chrysostomos via RT wrote​:

The same thing happens with m.\..\, which is equivalent to /./\, not /\./.

Wow. That's unexpected! I would call it a bug even.

The escape doesn't escape! How is that not a bug?

Because it is documented to happen.

And also\, it escapes the delimeters\, which is what I do expect.

regards\, Vadim.

p5pRT commented 12 years ago

From @ikegami

On Tue\, May 29\, 2012 at 1​:26 AM\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 29 May 2012 07​:17\, Eric Brine \ikegami@&#8203;adaelis\.com wrote​:

The escape doesn't escape! How is that not a bug?

Because it is documented to happen.

Where? If so\, it directly contradicts perlre.

"Quote the next metacharacter."

"So anything that looks like \\\, \(\, \)\, \\<\, \>\, \{\, or \} is always interpreted as a literal character\, not a metacharacter."

"Any single character matches itself\, unless it is a metacharacter with a special meaning described here or above. You can cause characters that normally function as metacharacters to be interpreted literally by prefixing them with a "\" (e.g.\, "\." matches a "."\, not any character; "\\" matches a "\"). This escape mechanism is also required for the character used as the pattern delimiter."

p5pRT commented 12 years ago

From @demerphq

On 29 May 2012 07​:45\, Konovalov\, Vadim (Vadim)** CTR ** \vadim\.konovalov@&#8203;alcatel\-lucent\.com wrote​:

From​: demerphq On 29 May 2012 07​:17\, Eric Brine wrote​:

On Mon\, May 28\, 2012 at 11​:45 AM\, Father Chrysostomos via RT wrote​:

The same thing happens with m.\..\, which is equivalent to /./\, not /\./.

Wow. That's unexpected! I would call it a bug even.

The escape doesn't escape! How is that not a bug?

Because it is documented to happen.

And also\, it escapes the delimeters\, which is what I do expect.

Sorry? No. The tokenizer *unescapes* the delimiters. There is NO way to pass esc-delimiter "through" the tokenizer to something deeper. Which makes sense for everything but regexes.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From @demerphq

On 29 May 2012 08​:23\, Eric Brine \ikegami@&#8203;adaelis\.com wrote​:

On Tue\, May 29\, 2012 at 1​:26 AM\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 29 May 2012 07​:17\, Eric Brine \ikegami@&#8203;adaelis\.com wrote​:

The escape doesn't escape! How is that not a bug?

Because it is documented to happen.

Where? If so\, it directly contradicts perlre.

"Quote the next metacharacter."

"So anything that looks like \\\, \(\, \)\, \\<\, \>\, \{\, or \} is always interpreted as a literal character\, not a metacharacter."

"Any single character matches itself\, unless it is a metacharacter with a special meaning described here or above. You can cause characters that normally function as metacharacters to be interpreted literally by prefixing them with a "\" (e.g.\, "\." matches a "."\, not any character; "\\" matches a "\"). This escape mechanism is also required for the character used as the pattern delimiter."

I posted the relevent docs in the mail titled "Oh dear\, maybe we have to rethink 'Unescaped left brace in regex is deprecated' warnings..." (did noone see that?) But here it is again.. From perlop in the section titled "Gory details of parsing quoted constructs" with the subheading​:"RE" in "?RE?"\, "/RE/"\, "m/RE/"\, "s/RE/foo/"​:

  The lack of processing of "\\" creates specific restrictions on the post-processed text. If the delimiter is "/"\, one cannot get   the combination "\/" into the result of this step. "/" will finish the regular expression\, "\/" will be stripped to "/" on the   previous step\, and "\\/" will be left as is. Because "/" is equivalent to "\/" inside a regular expression\, this does not matter   unless the delimiter happens to be character special to the RE engine\, such as in "s*foo*bar*"\, "m[foo]"\, or "?foo?"; or an   alphanumeric char\, as in​:

  m m ^ a \s* b mmx;

  In the RE above\, which is intentionally obfuscated for illustration\, the delimiter is "m"\, the modifier is "mx"\, and after   delimiter-removal the RE is the same as for "m/ ^ a \s* b /mx". There's more than one reason you're encouraged to restrict your   delimiters to non-alphanumeric\, non-whitespace choices.

While documented I do think the behavior is undesirable and I think should be changed.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From vadim.konovalov@alcatel-lucent.com

From​: demerphq On 29 May 2012 07​:45\, Konovalov\, Vadim wrote​:

And also\, it escapes the delimeters\, which is what I do expect.

Sorry? No. The tokenizer *unescapes* the delimiters. There is NO way to pass esc-delimiter "through" the tokenizer to something deeper. Which makes sense for everything but regexes.

I am not talking about tokenizer\,

I just see that '\' escapes a dot in m.\... and this is what I had in mind when talking about escaping\, I had no intentions to mention on what happens internally.

On the other side\, escaping by '\' in replacement part of the   s{}{}e construct works just fine for me - so I have no problem with passing esc-delimiter to something deeper

Regards\, Vadim.

p5pRT commented 12 years ago

From @demerphq

On 28 May 2012 20​:31\, Zefram \zefram@&#8203;fysh\.org wrote​:

Father Chrysostomos via RT wrote​:

I think this pretty much ends this deprecation.  Too many people use {} as delimiters.

Not at all.  It's an especially confusing case\, especially deserving of the clarification wrought by the deprecation.

The same thing happens with m.\..\, which is equivalent to /./\, not /\./.

Maybe there should be a warning for every use of a backslashed metacharacter in this manner.  It's too easy to be mistaken about m.\.. and its ilk.

I dont think that is really a good idea. It would have to be handled by the tokenizer which would mean the tokenizer needs to know regex syntax which we really dont want\, consider regex engine plugins.

I think we should "just" disentangle the regex parsing from normal string parsing. Or\, hmm. Or we could change the regex engine interface to pass in the quote chars used in the pattern... Which would also cause issues with plugins\, but might be a good idea anyway.

cheers\, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From @demerphq

On 29 May 2012 08​:38\, Konovalov\, Vadim (Vadim)** CTR ** \vadim\.konovalov@&#8203;alcatel\-lucent\.com wrote​:

From​: demerphq On 29 May 2012 07​:45\, Konovalov\, Vadim wrote​:

And also\, it escapes the delimeters\, which is what I do expect.

Sorry? No. The tokenizer *unescapes* the delimiters. There is NO way to pass esc-delimiter "through" the tokenizer to something deeper. Which makes sense for everything but regexes.

I am not talking about tokenizer\,

I just see that '\' escapes a dot in m.\... and this is what I had in mind when talking about escaping\, I had no intentions to mention on what happens internally.

On the other side\, escaping by '\' in replacement part of the  s{}{}e construct works just fine for me - so I have no problem with passing esc-delimiter to something deeper

I dont think you are understanding this issue properly. What you just said either isnt what we are talking about\, or it doesnt do what you think it does.

perl -le'$_="x"; s/x/\//; print'

does NOT pass through '\/' to the regex engine. You can pass "\\/" through to the regex engine\, but not "\/" if the delims are /.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From @tux

On Mon\, 28 May 2012 19​:19​:25 +0100\, hv@​crypt.org wrote​:

andreas.koenig.7os6VVqR@​franz.ak.mind.de (Andreas J. Koenig) wrote​: :Found in GRANTM/XML-Simple-2.18.tar.gz in lib/XML/Simple.pm​: : : 995 $val =~ s{\$\{([\w.]+)\}}{ $self->get_var($1) }ge; : 1031 $val =~ s{\$\{(\w+)\}}{ $self->get_var($1) }ge; : :% make test :[...] :t/0_Config.t .. Unescaped left brace in regex is deprecated\, passed through in regex; marked by \<-- HERE in m/\${ \<-- HERE ([\w.]+)}/ at /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-cUi3UY/blib/lib/XML/Simple.pm line 995. :Unescaped left brace in regex is deprecated\, passed through in regex; marked by \<-- HERE in m/\${ \<-- HERE (\w+)}/ at /tmp/tmp.4uMUAQPZaT/XML-Simple-2.18-cUi3UY/blib/lib/XML/Simple.pm line 1031. :# Package Version :# perl 5.17.0 :# XML​::Simple 2.18 :# Storable 2.35 :# XML​::Parser 2.41 :# XML​::SAX 0.99 :# XML​::NamespaceSupport 1.11 :t/0_Config.t .. ok
: : :Look like perl miscounts one backslash. I would expect that the regexp :is accepted by perl because the brace is escaped.

IIRC\, when you use a regex metacharacter as a delimiter\, escaping it gives you the metacharacter​:

% perl -wle 'print "line\n" =~ m$\$$' 1 %

So I think the warning is correct\, though somewhat misleadingly expressed.

I don't remember if there is a way to get past that to match the literal.

Hugo

Got this report this morning​:

http​://www.cpantesters.org/cpan/report/b40c2a9c-a87e-11e1-85e6-b1e975b706df

t/11_DDumper.t .... ok

# Failed test 'no warnings' # at /home/src/perl/repoperls/installed-perls/perl/v5.16.0-226-g760209f/165a/lib/site_perl/5.17.0/Test/NoWarnings.pm line 45. # There were 1 warning(s) # Previous test 0 '' # Unescaped left brace in regex is deprecated\, passed through in regex; marked by \<-- HERE in m/^PVIV\("a\\(n|12)\\342\\202\\254"\\0\) \[UTF8 "a\\?n\\x{ \<-- HERE 20ac}"\]/ at t/20_DPeek.t line 72. # at t/20_DPeek.t line 72. # # Looks like you failed 1 test of 50. t/20_DPeek.t ...... Dubious\, test returned 1 (wstat 256\, 0x100) Failed 1/50 subtests

code that causes it

  SKIP​: {   $] \<= 5.008001 and skip "UTF8 tests useless in this ancient perl version"\, 1;   $VAR = "a\x0a\x{20ac}";   like (DPeek ($VAR)\, qr'^PVIV\("a\\(n|12)\\342\\202\\254"\\0\) \[UTF8 "a\\?n\\x{20ac}"\]'\,   ' $VAR "a\x0a\x{20ac}"');   }

I just added the \ before the { as it passes on 5.8.0 up to blead (64 versions of perl)

-- H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/ using perl5.00307 .. 5.14 porting perl5 on HP-UX\, AIX\, and openSUSE http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/ http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

p5pRT commented 12 years ago

From @ikegami

On Tue\, May 29\, 2012 at 2​:34 AM\, demerphq \demerphq@&#8203;gmail\.com wrote​:

I posted the relevent docs in the mail titled "Oh dear\, maybe we have to rethink 'Unescaped left brace in regex is deprecated' warnings..."

(did noone see that?) But here it is again..

I understand why it behaves the way it does.

From perlop in the section titled "Gory details of parsing quoted constructs"

with the subheading​:"RE" in "?RE?"\, "/RE/"\, "m/RE/"\, "s/RE/foo/"​:

Ah\, so there is some disagreement\, then. perlre says it is required for meta characters *and also for delimiters*\, with no caveat. The underlying part indicates it's talking about literals\, not just patterns.

- Eric

p5pRT commented 12 years ago

From @cpansprout

On Mon May 28 23​:42​:13 2012\, demerphq wrote​:

On 29 May 2012 08​:38\, Konovalov\, Vadim (Vadim)** CTR ** \vadim\.konovalov@&#8203;alcatel\-lucent\.com wrote​:

From​: demerphq On 29 May 2012 07​:45\, Konovalov\, Vadim wrote​:

And also\, it escapes the delimeters\, which is what I do expect.

Sorry? No. The tokenizer *unescapes* the delimiters. There is NO way to pass esc-delimiter "through" the tokenizer to something deeper. Which makes sense for everything but regexes.

I am not talking about tokenizer\,

I just see that '\' escapes a dot in m.\... and this is what I had in mind when talking about escaping\, I had no intentions to mention on what happens internally.

On the other side\, escaping by '\' in replacement part of the �s{}{}e construct works just fine for me - so I have no problem with passing esc-delimiter to something deeper

I dont think you are understanding this issue properly. What you just said either isnt what we are talking about\, or it doesnt do what you think it does.

perl -le'$_="x"; s/x/\//; print'

does NOT pass through '\/' to the regex engine. You can pass "\\/" through to the regex engine\, but not "\/" if the delims are /.

But \ does escape the delimiter in that it stops it from being interpreted as a delimiter.

--

Father Chrysostomos

p5pRT commented 12 years ago

From @hvds

demerphq \demerphq@&#8203;gmail\.com wrote​: :On 28 May 2012 20​:31\, Zefram \zefram@&#8203;fysh\.org wrote​: :> Father Chrysostomos via RT wrote​: :>>I think this pretty much ends this deprecation.  Too many people use {} :>>as delimiters. :> :> Not at all.  It's an especially confusing case\, especially deserving of :> the clarification wrought by the deprecation. : :The alternative is to stop the tokenizer from doing this type of :unescaping on regex patterns. There is no need to do it\, and it breaks :stuff. Seems like a good reason to stop doing something.

It occurs to me that we could draw a useful distinction between balanced delimiters and others.

I think \<> are the only balanced delimiters used in an unbalanced way in regular expression syntax. If we were to disable the escape-stripping for those cases\, the cost (I think) would be you could no longer use lookbehinds and cuts in \<>-delimited regexps\, which sounds like a small price; but now you would be able to use literal ()\, [] and {} even when they matched your delimiters\, by escaping them.

Of course that would involve a) disentangling at least some of the work from the tokenizer\, and b) an incompatible change requiring either a deprecation cycle or protection behind a feature. Oh\, and I guess also c) an undertaking not to introduce new unbalanced uses of these in regexp syntax\, like (?[...)\, though I don't particularly imagine we would have considered such.

I'm not really sure how much work this would be\, however.

Hugo

p5pRT commented 12 years ago

From @demerphq

On 29 May 2012 09​:27\, \hv@&#8203;crypt\.org wrote​:

demerphq \demerphq@&#8203;gmail\.com wrote​: :On 28 May 2012 20​:31\, Zefram \zefram@&#8203;fysh\.org wrote​: :> Father Chrysostomos via RT wrote​: :>>I think this pretty much ends this deprecation.  Too many people use {} :>>as delimiters. :> :> Not at all.  It's an especially confusing case\, especially deserving of :> the clarification wrought by the deprecation. : :The alternative is to stop the tokenizer from doing this type of :unescaping on regex patterns. There is no need to do it\, and it breaks :stuff. Seems like a good reason to stop doing something.

It occurs to me that we could draw a useful distinction between balanced delimiters and others.

I think \<> are the only balanced delimiters used in an unbalanced way in regular expression syntax. If we were to disable the escape-stripping for those cases\, the cost (I think) would be you could no longer use lookbehinds and cuts in \<>-delimited regexps\, which sounds like a small price; but now you would be able to use literal ()\, [] and {} even when they matched your delimiters\, by escaping them.

Of course that would involve a) disentangling at least some of the work from the tokenizer\, and b) an incompatible change requiring either a deprecation cycle or protection behind a feature. Oh\, and I guess also c) an undertaking not to introduce new unbalanced uses of these in regexp syntax\, like (?[...)\, though I don't particularly imagine we would have considered such.

I'm not really sure how much work this would be\, however.

I plan to look into it. Your analysis is much appreciated.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From @khwilliamson

The example below uses single quotes as qr delimiters

On 05/29/2012 12​:42 AM\, H.Merijn Brand wrote​:

code that causes it

SKIP​: { $]\<= 5.008001 and skip "UTF8 tests useless in this ancient perl version"\, 1; $VAR = "a\x0a\x{20ac}"; like (DPeek ($VAR)\, qr'^PVIV\("a\\(n|12)\\342\\202\\254"\\0\) \[UTF8 "a\\?n\\x{20ac}"\]'\, ' $VAR "a\x0a\x{20ac}"'); }

Bug #21491 says that single quotes should not interpolate. But this code assumes that it does. If we fixed #21491\, I believe it would break this code\, would it not?

I wonder how much code is out there that depends on #21491 being broken.   We might have to mark it as won't fix\, then.

p5pRT commented 12 years ago

From @cpansprout

On Fri Jun 01 14​:40​:56 2012\, public@​khwilliamson.com wrote​:

The example below uses single quotes as qr delimiters

On 05/29/2012 12​:42 AM\, H.Merijn Brand wrote​:

code that causes it

SKIP​: { $]\<= 5.008001 and skip "UTF8 tests useless in this ancient perl version"\, 1; $VAR = "a\x0a\x{20ac}"; like (DPeek ($VAR)\, qr'^PVIV\("a\\(n|12)\\342\\202\\254"\\0\) \[UTF8 "a\\?n\\x{20ac}"\]'\, ' $VAR "a\x0a\x{20ac}"'); }

Bug #21491 says that single quotes should not interpolate. But this code assumes that it does. If we fixed #21491\, I believe it would break this code\, would it not?

Yes\, and it would diverge from the long-documented behaviour​:

  Customary Generic Meaning Interpolates   '' q{} Literal no   "" qq{} Literal yes   `` qx{} Command yes*   qw{} Word list no   // m{} Pattern match yes*   qr{} Pattern yes*   s{}{} Substitution yes*   tr{}{} Transliteration no (but see below)   y{}{} Transliteration no (but see below)   \<\<EOF here-doc yes*

  * unless the delimiter is ''.

I wonder how much code is out there that depends on #21491 being broken. We might have to mark it as won't fix\, then.

Yes\, and not-a-bug.

--

Father Chrysostomos

p5pRT commented 12 years ago

From @cpansprout

On Fri Jun 01 17​:57​:55 2012\, sprout wrote​:

On Fri Jun 01 14​:40​:56 2012\, public@​khwilliamson.com wrote​:

The example below uses single quotes as qr delimiters

On 05/29/2012 12​:42 AM\, H.Merijn Brand wrote​:

code that causes it

SKIP​: { $]\<= 5.008001 and skip "UTF8 tests useless in this ancient perl version"\, 1; $VAR = "a\x0a\x{20ac}"; like (DPeek ($VAR)\, qr'^PVIV\("a\\(n|12)\\342\\202\\254"\\0\) \[UTF8 "a\\?n\\x{20ac}"\]'\, ' $VAR "a\x0a\x{20ac}"'); }

Bug #21491 says that single quotes should not interpolate. But this code assumes that it does. If we fixed #21491\, I believe it would break this code\, would it not?

Yes\, and it would diverge from the long-documented behaviour​:

Customary  Generic        Meaning         Interpolates
''     q\{\}          Literal          no
""    qq\{\}          Literal          yes
\`\`    qx\{\}          Command          yes\*
    qw\{\}         Word list          no
//     m\{\}       Pattern match      yes\*
    qr\{\}          Pattern          yes\*
     s\{\}\{\}        Substitution      yes\*
    tr\{\}\{\}      Transliteration      no \(but see below\)
     y\{\}\{\}      Transliteration      no \(but see below\)
    \<\<EOF                 here\-doc            yes\*

\* unless the delimiter is ''\.

I wonder how much code is out there that depends on #21491 being broken. We might have to mark it as won't fix\, then.

Yes\, and not-a-bug.

Sorry\, I was a little confused.

The reason for it not being a bug is that\, if m '\n' stops matching "\n"\, then $foo =~ $user_pat will stop working if the user enters '\n'. That means ack '\n' won’t work any more.

--

Father Chrysostomos

p5pRT commented 12 years ago

From @demerphq

On 2 June 2012 03​:01\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Fri Jun 01 17​:57​:55 2012\, sprout wrote​:

On Fri Jun 01 14​:40​:56 2012\, public@​khwilliamson.com wrote​:

The example below uses single quotes as qr delimiters

On 05/29/2012 12​:42 AM\, H.Merijn Brand wrote​:

code that causes it

   SKIP​: {        $]\<= 5.008001 and skip "UTF8 tests useless in this ancient perl version"\, 1;        $VAR = "a\x0a\x{20ac}";        like (DPeek ($VAR)\, qr'^PVIV\("a\\(n|12)\\342\\202\\254"\\0\) \[UTF8 "a\\?n\\x{20ac}"\]'\,                                                    ' $VAR "a\x0a\x{20ac}"');        }

Bug #21491 says that single quotes should not interpolate.  But this code assumes that it does.  If we fixed #21491\, I believe it would break this code\, would it not?

Yes\, and it would diverge from the long-documented behaviour​:

    Customary  Generic        Meaning      Interpolates       ''       q{}          Literal             no       ""      qq{}          Literal             yes       ``      qx{}          Command             yes*               qw{}         Word list            no       //       m{}       Pattern match          yes*               qr{}          Pattern             yes*                s{}{}      Substitution          yes*               tr{}{}    Transliteration         no (but see below)                y{}{}    Transliteration         no (but see below)         \<\<EOF                 here-doc            yes*

      * unless the delimiter is ''.

I wonder how much code is out there that depends on #21491 being broken.   We might have to mark it as won't fix\, then.

Yes\, and not-a-bug.

Sorry\, I was a little confused.

The reason for it not being a bug is that\, if m '\n' stops matching "\n"\, then $foo =~ $user_pat will stop working if the user enters '\n'.  That means ack '\n' won’t work any more.

That doesnt make sense. Single quotes for ack are a shell quoting issue. ack doesnt see the quotes.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From @cpansprout

On Sat Jun 02 02​:07​:11 2012\, demerphq wrote​:

On 2 June 2012 03​:01\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Fri Jun 01 17​:57​:55 2012\, sprout wrote​:

On Fri Jun 01 14​:40​:56 2012\, public@​khwilliamson.com wrote​:

The example below uses single quotes as qr delimiters

On 05/29/2012 12​:42 AM\, H.Merijn Brand wrote​:

code that causes it

   SKIP​: {        $]\<= 5.008001 and skip "UTF8 tests useless in this ancient perl version"\, 1;        $VAR = "a\x0a\x{20ac}";        like (DPeek ($VAR)\, qr'^PVIV\("a\\(n|12)\\342\\202\\254"\\0\) \[UTF8 "a\\?n\\x{20ac}"\]'\,                                                    ' $VAR "a\x0a\x{20ac}"');        }

Bug #21491 says that single quotes should not interpolate.  But this code assumes that it does.  If we fixed #21491\, I believe it would break this code\, would it not?

Yes\, and it would diverge from the long-documented behaviour​:

    Customary  Generic        Meaning      Interpolates       ''       q{}          Literal             no       ""      qq{}          Literal             yes       ``      qx{}          Command             yes*               qw{}         Word list            no       //       m{}       Pattern match          yes*               qr{}          Pattern             yes*                s{}{}      Substitution          yes*               tr{}{}    Transliteration         no (but see below)                y{}{}    Transliteration         no (but see below)         \<\<EOF                 here-doc            yes*

      * unless the delimiter is ''.

I wonder how much code is out there that depends on #21491 being broken.   We might have to mark it as won't fix\, then.

Yes\, and not-a-bug.

Sorry\, I was a little confused.

The reason for it not being a bug is that\, if m '\n' stops matching "\n"\, then $foo =~ $user_pat will stop working if the user enters '\n'.  That means ack '\n' won’t work any more.

That doesnt make sense. Single quotes for ack are a shell quoting issue. ack doesnt see the quotes.

Either way\, the regular expression engine itself has to interpret \n. It can’t rely on m// syntax to resolve it.

--

Father Chrysostomos

p5pRT commented 12 years ago

From @demerphq

On 2 June 2012 14​:29\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Sat Jun 02 02​:07​:11 2012\, demerphq wrote​:

On 2 June 2012 03​:01\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Fri Jun 01 17​:57​:55 2012\, sprout wrote​:

On Fri Jun 01 14​:40​:56 2012\, public@​khwilliamson.com wrote​:

The example below uses single quotes as qr delimiters

On 05/29/2012 12​:42 AM\, H.Merijn Brand wrote​:

code that causes it

   SKIP​: {        $]\<= 5.008001 and skip "UTF8 tests useless in this ancient perl version"\, 1;        $VAR = "a\x0a\x{20ac}";        like (DPeek ($VAR)\, qr'^PVIV\("a\\(n|12)\\342\\202\\254"\\0\) \[UTF8 "a\\?n\\x{20ac}"\]'\,                                                    ' $VAR "a\x0a\x{20ac}"');        }

Bug #21491 says that single quotes should not interpolate.  But this code assumes that it does.  If we fixed #21491\, I believe it would break this code\, would it not?

Yes\, and it would diverge from the long-documented behaviour​:

    Customary  Generic        Meaning      Interpolates       ''       q{}          Literal             no       ""      qq{}          Literal             yes       ``      qx{}          Command             yes*               qw{}         Word list            no       //       m{}       Pattern match          yes*               qr{}          Pattern             yes*                s{}{}      Substitution          yes*               tr{}{}    Transliteration         no (but see below)                y{}{}    Transliteration         no (but see below)         \<\<EOF                 here-doc            yes*

      * unless the delimiter is ''.

I wonder how much code is out there that depends on #21491 being broken.   We might have to mark it as won't fix\, then.

Yes\, and not-a-bug.

Sorry\, I was a little confused.

The reason for it not being a bug is that\, if m '\n' stops matching "\n"\, then $foo =~ $user_pat will stop working if the user enters '\n'.  That means ack '\n' won’t work any more.

That doesnt make sense. Single quotes for ack are a shell quoting issue. ack doesnt see the quotes.

Either way\, the regular expression engine itself has to interpret \n. It can’t rely on m// syntax to resolve it.

Its cant rely on the tokenizer to resolve it no. That is a general rule\, nearly.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From @rjbs

* Karl Williamson \public@&#8203;khwilliamson\.com [2012-06-01T17​:40​:18]

Bug #21491 says that single quotes should not interpolate. But this code assumes that it does. If we fixed #21491\, I believe it would break this code\, would it not?

I wonder how much code is out there that depends on #21491 being broken. We might have to mark it as won't fix\, then.

This very naive CPAN search indicates "not much."

  http​://grep.cpan.me/?q=qr%27%5B%5E%27%5Cn%5D%2B%5C%24%5B%5E%27%5D&page=2

I think that bug should still be fixed.

-- rjbs

p5pRT commented 12 years ago

From @rjbs

* Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org [2012-06-02T08​:29​:07]

The reason for it not being a bug is that\, if m '\n' stops matching "\n"\, then $foo =~ $user_pat will stop working if the user enters '\n'.  That means ack '\n' won’t work any more.

That doesnt make sense. Single quotes for ack are a shell quoting issue. ack doesnt see the quotes.

Either way\, the regular expression engine itself has to interpret \n. It can’t rely on m// syntax to resolve it.

My understanding is that the regular expression engine has its own machinery for turning \n into a \n to match\, apart from the qq-ish behavior.

I could be wrong. Somebody tell me.

-- rjbs

p5pRT commented 12 years ago

From @cpansprout

On Mon Jun 04 16​:52​:26 2012\, perl.p5p@​rjbs.manxome.org wrote​:

* Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org [2012-06- 02T08​:29​:07]

The reason for it not being a bug is that\, if m '\n' stops matching "\n"\, then $foo =~ $user_pat will stop working if the user enters '\n'.  That means ack '\n' won’t work any more.

That doesnt make sense. Single quotes for ack are a shell quoting issue. ack doesnt see the quotes.

Either way\, the regular expression engine itself has to interpret \n. It can’t rely on m// syntax to resolve it.

My understanding is that the regular expression engine has its own machinery for turning \n into a \n to match\, apart from the qq-ish behavior.

I could be wrong. Somebody tell me.

That’s right\, which is why "\n" =~ '\n' matches\, and why "\n" =~ m'\n' should continue to match.

--

Father Chrysostomos

p5pRT commented 12 years ago

From @demerphq

On 5 June 2012 01​:51\, Ricardo Signes \perl\.p5p@&#8203;rjbs\.manxome\.org wrote​:

* Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org [2012-06-02T08​:29​:07]

The reason for it not being a bug is that\, if m '\n' stops matching "\n"\, then $foo =~ $user_pat will stop working if the user enters '\n'.  That means ack '\n' won’t work any more.

That doesnt make sense. Single quotes for ack are a shell quoting issue. ack doesnt see the quotes.

Either way\, the regular expression engine itself has to interpret \n. It can’t rely on m// syntax to resolve it.

My understanding is that the regular expression engine has its own machinery for turning \n into a \n to match\, apart from the qq-ish behavior.

I could be wrong.  Somebody tell me.

I already said this was the case. The regex engine cannot depend on the tokenizer handling *any* escapes\, although there are some that are handled by both the tokenizer AND the regex engine.

Tokenizer does nothing​:

$ perl -Mre=debug -e'/\n/' Compiling REx "\n" Final program​:   1​: EXACT \<\n> (3)   3​: END (0) anchored "%n" at 0 (checking anchored isall) minlen 1 Freeing REx​: "\n"

Tokenizer does something​:

$ perl -Mre=debug -e'my $x="\n"; /$x/' Compiling REx "%n" Final program​:   1​: EXACT \<\n> (3)   3​: END (0) anchored "%n" at 0 (checking anchored isall) minlen 1 Freeing REx​: "%n"

Notice the difference?

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From @demerphq

On 1 June 2012 23​:40\, Karl Williamson \public@&#8203;khwilliamson\.com wrote​:

The example below uses single quotes as qr delimiters

On 05/29/2012 12​:42 AM\, H.Merijn Brand wrote​:

code that causes it

  SKIP​: {       $]\<= 5.008001 and skip "UTF8 tests useless in this ancient perl version"\, 1;       $VAR = "a\x0a\x{20ac}";       like (DPeek ($VAR)\, qr'^PVIV\("a\\(n|12)\\342\\202\\254"\\0\) \[UTF8 "a\\?n\\x{20ac}"\]'\,                                                   ' $VAR "a\x0a\x{20ac}"');       }

Bug #21491 says that single quotes should not interpolate.  But this code assumes that it does.  If we fixed #21491\, I believe it would break this code\, would it not?

Are you sure about that? I don't see any interpolation there. Do you mean escape handling?

As far as I understand things \\ and \' are *supposed* to be unescaped inside m''.

So what do you mean by this? I see nothing unexpected here.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From @rjbs

* demerphq \demerphq@&#8203;gmail\.com [2012-06-05T02​:12​:29]

On 1 June 2012 23​:40\, Karl Williamson \public@&#8203;khwilliamson\.com wrote​:

Bug #21491 says that single quotes should not interpolate.  But this code assumes that it does.  If we fixed #21491\, I believe it would break this code\, would it not?

Are you sure about that? I don't see any interpolation there. Do you mean escape handling?

As far as I understand things \\ and \' are *supposed* to be unescaped inside m''.

So what do you mean by this? I see nothing unexpected here.

I was confused by this\, too\, and foolishly didn't go read 21491. This is about escape sequences\, not variable interpolation. I was rushing through my mail queue and not verifying everything I read. I'm sorry if this lead to spreading any confusion!

Yes\, fixing this looks like it would break the world. In fact\, I think the busted thing is likely the documentation\, although it looks like it needs a careful read before I really state that with confidence.

-- rjbs

p5pRT commented 12 years ago

From @demerphq

On 5 June 2012 14​:26\, Ricardo Signes \perl\.p5p@&#8203;rjbs\.manxome\.org wrote​:

* demerphq \demerphq@&#8203;gmail\.com [2012-06-05T02​:12​:29]

On 1 June 2012 23​:40\, Karl Williamson \public@&#8203;khwilliamson\.com wrote​:

Bug #21491 says that single quotes should not interpolate.  But this code assumes that it does.  If we fixed #21491\, I believe it would break this code\, would it not?

Are you sure about that? I don't see any interpolation there. Do you mean escape handling?

As far as I understand things \\ and \' are *supposed* to be unescaped inside m''.

So what do you mean by this? I see nothing unexpected here.

I was confused by this\, too\, and foolishly didn't go read 21491.  This is about escape sequences\, not variable interpolation.  I was rushing through my mail queue and not verifying everything I read.  I'm sorry if this lead to spreading any confusion!

Arent the case that Karl mentioned and the case in the bug different?

The case in the bug comes down to this​:

my $pat= "\\n"; print "\n"=~/$pat/;

Which matches\, because the toker first turns "\\n" into "\n" and then hands it to the regex engine which turns the "\n" into a literal $n.

This behaviour has changed over time and the docs should probably explain that \n IS a regex escape sequence just like \w\, which "happens" to match the same thing that "\n" is unescaped into.

Yes\, fixing this looks like it would break the world.  In fact\, I think the busted thing is likely the documentation\, although it looks like it needs a careful read before I really state that with confidence.

Well\, I can argue the case of​: $x="\\n"; "\n"=~/$x/\, but the case of "\n"=~m'\\n' is a lot easier to say is a bug. Even if neither is entirely clear.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From @rurban

This is a bug report for perl from rurban@​cpanel.net\, generated with the help of perlbug 1.39 running under perl 5.17.1.

From a375e6bcfaaf64bae9ab3e153f1721225d6ae631 Mon Sep 17 00​:00​:00 2001 From​: Reini Urban \rurban@&#8203;x\-ray\.at Date​: Mon\, 11 Jun 2012 09​:18​:21 -0500 Subject​: [PATCH] [perl #113094] Fix a couple of Unescaped left brace in regex


cpan/ExtUtils-MakeMaker/t/MM_OS2.t | 2 +- lib/DB.t | 2 +- 2 files changed\, 2 insertions(+)\, 2 deletions(-)

Inline Patch ```diff diff --git a/cpan/ExtUtils-MakeMaker/t/MM_OS2.t b/cpan/ExtUtils-MakeMaker/t/MM_OS2.t index 4d88e85..2997541 100644 --- a/cpan/ExtUtils-MakeMaker/t/MM_OS2.t +++ b/cpan/ExtUtils-MakeMaker/t/MM_OS2.t @@ -42,7 +42,7 @@ delete $mm->{SKIPHASH}; my $res = $mm->dlsyms(); like( $res, qr/baseext\.def: Makefile/, '... without flag, should return make targets' ); -like( $res, qr/"DL_FUNCS" => { }/, +like( $res, qr/"DL_FUNCS" => \{ }/, '... should provide empty hash refs where necessary' ); like( $res, qr/"DL_VARS" => \[]/, '... and empty array refs too' ); diff --git a/lib/DB.t b/lib/DB.t index a1fadf3..cdb6583 100644 --- a/lib/DB.t +++ b/lib/DB.t @@ -126,7 +126,7 @@ is( DB::_clientname('bar'), undef, my @ret = eval { DB->backtrace() }; like( $ret[0], qr/file.+\Q$0\E/, 'DB::backtrace() should report current file'); like( $ret[0], qr/line $line/, '... should report calling line number' ); - like( $ret[0], qr/eval {...}/, '... should catch eval BLOCK' ); + like( $ret[0], qr/eval \{...}/, '... should catch eval BLOCK' ); @ret = eval "one(2)"; is( scalar @ret, 1, '... should report from provided stack frame number' ); -- ```

1.7.10


Flags​:   category=library   severity=low


Site configuration information for perl 5.17.1​:

Configured by rurban at Tue Jun 5 09​:12​:24 CDT 2012.

Summary of my perl5 (revision 5 version 17 subversion 1) configuration​:   Derived from​: 65bc432f8bb96d463b290c78d34350cb2d289cbc   Platform​:   osname=linux\, osvers=3.2.0-2-amd64\, archname=x86_64-linux-thread-multi-debug@​65bc432   uname='linux reini 3.2.0-2-amd64 #1 smp mon may 21 17​:45​:41 utc 2012 x86_64 gnulinux '   config_args='-de -Dusedevel -Dinstallman1dir=none -Dinstallman3dir=none -Dinstallsiteman1dir=none -Dinstallsiteman3dir=none -Dmksymlinks -DEBUGGING -Doptimize=-g3 -Duseithreads -Accflags='-msse4.2' -Accflags='-march=corei7' -Dcf_email='rurban@​cpanel.net' -Dperladmin='rurban@​cpanel.net' -Duseshrplib'   hint=recommended\, useposix=true\, d_sigaction=define   useithreads=define\, usemultiplicity=define   useperlio=define\, d_sfio=undef\, uselargefiles=define\, usesocks=undef   use64bitint=define\, use64bitall=define\, uselongdouble=undef   usemymalloc=n\, bincompat5005=undef   Compiler​:   cc='cc'\, ccflags ='-D_REENTRANT -D_GNU_SOURCE -msse4.2 -march=corei7 -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'\,   optimize='-g3'\,   cppflags='-D_REENTRANT -D_GNU_SOURCE -msse4.2 -march=corei7 -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'   ccversion=''\, gccversion='4.6.3'\, gccosandvers=''   intsize=4\, longsize=8\, ptrsize=8\, doublesize=8\, byteorder=12345678   d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=16   ivtype='long'\, ivsize=8\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=8   alignbytes=8\, prototype=define   Linker and Libraries​:   ld='cc'\, ldflags =' -fstack-protector -L/usr/local/lib'   libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib   libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat   perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc   libc=\, so=so\, useshrplib=true\, libperl=libperl.so   gnulibc_version='2.13'   Dynamic Linking​:   dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-Wl\,-E -Wl\,-rpath\,/usr/local/lib/perl5/5.17.1/x86_64-linux-thread-multi-debug@​65bc432/CORE'   cccdlflags='-fPIC'\, lddlflags='-shared -g3 -L/usr/local/lib -fstack-protector'

Locally applied patches​:   [cpan #72700] List​::Util heap-overflow


@​INC for perl 5.17.1​:   /usr/local/lib/perl5/site_perl/5.17.1/x86_64-linux-thread-multi-debug@​65bc432   /usr/local/lib/perl5/site_perl/5.17.1   /usr/local/lib/perl5/5.17.1/x86_64-linux-thread-multi-debug@​65bc432   /usr/local/lib/perl5/5.17.1   /usr/local/lib/perl5/site_perl   .


Environment for perl 5.17.1​:   HOME=/home/rurban   LANG=en_US.UTF-8   LANGUAGE (unset)   LD_LIBRARY_PATH (unset)   LOGDIR (unset)   PATH=/home/rurban/bin​:/usr/local/bin​:/usr/bin​:/bin​:/usr/games   PERL_BADLANG (unset)   SHELL=/bin/bash

p5pRT commented 12 years ago

From @trwyant

The ExtUtils-MakeMaker and DB warnings reported by Reini Urban are still present in 5.17.2. The former is also https://rt.cpan.org/Ticket/Display.html?id=77468

p5pRT commented 12 years ago

From @trwyant

Interesting thing found under 5.17.2​:

$ perl -E 'm{ \$ \{ }x' Unescaped left brace in regex is deprecated\, passed through in regex; marked by \<-- HERE in m/ \$ { \<-- HERE / at -e line 1. $ perl -E 'm{ x \{ }x' Unescaped left brace in regex is deprecated\, passed through in regex; marked by \<-- HERE in m/ x { \<-- HERE / at -e line 1. $ perl -E 'm{ \{ }x' $ perl -E 'm\< \$ \{ >x' $

Is this a weird bug in the escape code\, or am I missing something?

p5pRT commented 12 years ago

From rmbarker.cpan@btinternet.com

On Sat\, 2012-07-21 at 07​:26 -0700\, Tom Wyant via RT wrote​:

The ExtUtils-MakeMaker and DB warnings reported by Reini Urban are still present in 5.17.2. The former is also https://rt.cpan.org/Ticket/Display.html?id=77468

The latter is fixed by 7150f9197f27c7cc16a06b3e01391c49c78398ce

p5pRT commented 12 years ago

From @khwilliamson

On 07/21/2012 11​:44 AM\, Robin Barker wrote​:> I suggest this bug is marked as resolved.

The only warnings now come from cpan/ modules. [perl#113094] is tracking the \{ issue for CPAN modules.

OK\, moving it to the 113094

On 07/21/2012 07​:51 AM\, Dave Mitchell wrote​:

On Fri\, Jul 20\, 2012 at 08​:35​:27PM -0700\, Reverend Chip wrote​:

On 7/19/2012 10​:15 AM\, Karl Williamson wrote​:

3) Consider this as acceptable collateral breakage\, document it\, and keep the warning for a cycle or two\, after which we prohibit unescaped literal left brackets.

The final possibility is only feasible if there is very little current breakage.

My 2c​: I think this is ideal\, since stripping \ on {} in regexes seems like something we should never have done in the first place.

Note that if we changed it so that escaped delimiters are no longer stripped\, *all* the following regexes would change their meaning; the last three would become compile errors\, while the first four would just silently start matching different things​:

 qr\[^\\\[a\-z\\\]$\]

 qr\#  a
 \\\#xxx
 b
 \#x;

 qr\(^\\\(x\\\)$\);

 m?^xy\\?$?

 qr\!a\(?\\\!b\)\!;

 qr\<a\(?\\\<foo\\>b\)>;

 qr|a\(?\\|foo\)|;

Correct me if I'm wrong\, but I believe that the most encompassing change would have breakages with if the delimiter is any of the dirty dozen metacharacters\, but no others. Since '/' isn't one of the 12\, there should be no potential problems with it.

Obviously\, if we decided to\, we could restrict the change to just '{'\, as that is the only one we care about now. But that isn't aesthetically pleasing.

p5pRT commented 12 years ago

From @iabyn

On Sat\, Jul 21\, 2012 at 12​:42​:07PM -0700\, Karl Williamson via RT wrote​:

Note that if we changed it so that escaped delimiters are no longer stripped\, *all* the following regexes would change their meaning; the last three would become compile errors\, while the first four would just silently start matching different things​:

 qr\[^\\\[a\-z\\\]$\]

 qr\#  a
 \\\#xxx
 b
 \#x;

 qr\(^\\\(x\\\)$\);

 m?^xy\\?$?

 qr\!a\(?\\\!b\)\!;

 qr\<a\(?\\\<foo\\>b\)>;

 qr|a\(?\\|foo\)|;

Correct me if I'm wrong\, but I believe that the most encompassing change would have breakages with if the delimiter is any of the dirty dozen metacharacters\, but no others. Since '/' isn't one of the 12\, there should be no potential problems with it.

Correct. It's only cases where a regex metacharacter is used a delimiter\, and the char has to be escaped to allow the string as a whole to be correctly delimited. Formerly\, the quoting mechanism would strip the \\, allowing the metachar to been by the regex engine. Under the new proposal\, the engine would see the char as still escaped\, and thus no longer a metachar.

Obviously\, if we decided to\, we could restrict the change to just '{'\, as that is the only one we care about now. But that isn't aesthetically pleasing.

My own opinion is that quoting for literal patterns is already complex enough without introducing a special case for only certain delimiters.

-- Any [programming] language that doesn't occasionally surprise the novice will pay for it by continually surprising the expert.   -- Larry Wall

p5pRT commented 12 years ago

From @cpansprout

On Sat Jul 21 12​:42​:06 2012\, khw wrote​:

On 07/21/2012 11​:44 AM\, Robin Barker wrote​:> I suggest this bug is marked as resolved.

The only warnings now come from cpan/ modules. [perl#113094] is tracking the \{ issue for CPAN modules.

OK\, moving it to the 113094

On 07/21/2012 07​:51 AM\, Dave Mitchell wrote​:

On Fri\, Jul 20\, 2012 at 08​:35​:27PM -0700\, Reverend Chip wrote​:

On 7/19/2012 10​:15 AM\, Karl Williamson wrote​:

3) Consider this as acceptable collateral breakage\, document it\, and keep the warning for a cycle or two\, after which we prohibit unescaped literal left brackets.

The final possibility is only feasible if there is very little current breakage.

My 2c​: I think this is ideal\, since stripping \ on {} in regexes seems like something we should never have done in the first place.

Note that if we changed it so that escaped delimiters are no longer stripped\, *all* the following regexes would change their meaning; the last three would become compile errors\, while the first four would just silently start matching different things​:

 qr\[^\\\[a\-z\\\]$\]

 qr\#  a
 \\\#xxx
 b
 \#x;

 qr\(^\\\(x\\\)$\);

 m?^xy\\?$?

 qr\!a\(?\\\!b\)\!;

 qr\<a\(?\\\<foo\\>b\)>;

 qr|a\(?\\|foo\)|;

Correct me if I'm wrong\, but I believe that the most encompassing change would have breakages with if the delimiter is any of the dirty dozen metacharacters\, but no others.

Punctuation variables. In a match-once pattern\, I can refer to $? as $\?.

Anyway\, if the interpretation of escaped delimiters is going to change (which I still oppose)\, there is no reason that regexps should differ from strings in this regard. This should still match\, no matter what​:

q n\nn =~ m n\nn;

Also\, please take m?fo\?? into account. I would have to rewrite that m?fo{0\,1}?.

--

Father Chrysostomos

p5pRT commented 12 years ago

From @khwilliamson

On 07/22/2012 03​:30 AM\, Dave Mitchell wrote​:

On Sat\, Jul 21\, 2012 at 12​:42​:07PM -0700\, Karl Williamson via RT wrote​:

Note that if we changed it so that escaped delimiters are no longer stripped\, *all* the following regexes would change their meaning; the last three would become compile errors\, while the first four would just silently start matching different things​:

  qr\[^\\\[a\-z\\\]$\]

  qr\#  a
 \\\#xxx
 b
  \#x;

  qr\(^\\\(x\\\)$\);

  m?^xy\\?$?

  qr\!a\(?\\\!b\)\!;

  qr\<a\(?\\\<foo\\>b\)>;

  qr|a\(?\\|foo\)|;

Correct me if I'm wrong\, but I believe that the most encompassing change would have breakages with if the delimiter is any of the dirty dozen metacharacters\, but no others. Since '/' isn't one of the 12\, there should be no potential problems with it.

Correct. It's only cases where a regex metacharacter is used a delimiter\, and the char has to be escaped to allow the string as a whole to be correctly delimited. Formerly\, the quoting mechanism would strip the \\, allowing the metachar to been by the regex engine. Under the new proposal\, the engine would see the char as still escaped\, and thus no longer a metachar.

Obviously\, if we decided to\, we could restrict the change to just '{'\, as that is the only one we care about now. But that isn't aesthetically pleasing.

My own opinion is that quoting for literal patterns is already complex enough without introducing a special case for only certain delimiters.

It is currently quite possible to make an easily overlooked error in specifying the general form of a quantifier\, and have it silently match the literal characters specified instead of the intended meaning. Indeed\, one of the first CPAN failures reported when the left brace experimental change was made is an example of this that had gone uncaught. I think that that demonstrated deficiency in the current scheme should carry significant weight in considering what to do.

We also have a need going forward to be able to specify new constructions to support the many Unicode features that we don't currently. The logical candidate for these is using braces.

One option is to say that this new rule applies only to balanced delimiters. Then the special case seems more logical and easier to remember. It reduces the issues to just 4 delimiters. The number of affected cases in CPAN is miniscule; results given at the end.

Another option would be to have an optional feature enabled under 'use 5.18'.

Two other options I've mentioned previously are to abandon any work in this area\, or to have a deprecation cycle for unnecessary escaping the delimiters.

Other option ideas are welcome.

Here is what I found with grepping cpan

Left brace as a delimiter​: {

yielded no problems\, as previously reported.

Less-than sign as a delimiter​: \< \b([ms]|qr)\s*\<[^\<]*\\\<

I found no examples where this would be a problem. There is one instance of this use​:

Regexp-NamedCaptures-0.05/lib/Regexp/NamedCaptures.pm

  { type => SCALAR\,   regex => qr\<\A\(\?\\<.+\>.*\)\z>s   }

But the meaning is unchanged.

Left paren as a delimiter​: ( (((^|[^\\])\b[ms])|\bqr)\s*\([^(]*\\\(

(returned only false positives. The regex is more complicated because of things like \s(... which are false positives.)

Left bracket as a delimiter​: [

yielded a single potential problem​:

perl-5.15.6/ext/B/t/OptreeCheck.pm

  # symbolic hints from the golden results.   $str =~ s[( # capture   \(\?​:next\|db\)state # the regexp matching next/db state   .* # all sorts of things follow it   v # The opening v   )   :(?​:\\[{*] # \{ or \*   |[^\,\\]) # or other symbols on their own

p5pRT commented 12 years ago

From @iabyn

On Thu\, Jul 26\, 2012 at 10​:51​:46AM -0600\, Karl Williamson wrote​:

Two other options I've mentioned previously are to abandon any work in this area\, or to have a deprecation cycle for unnecessary escaping the delimiters.

I think I like the idea of a deprecation cycle. i.e. warn on any literal regex which includes an escaped delimiter where the delimiter is a regex metachar. By the sound if it\, these are quite rare.

But have quite a long cycle; e.g. two major releases that have the deprecation warning. Then in the 3rd release we stop stripping the \\, and add the unescaped-{ warning.

-- Never work with children\, animals\, or actors.

p5pRT commented 11 years ago

From @khwilliamson

Commit e62d0b1335a7959680be5f7e56910067d6f33c1f reverts the offending commit

Instead\, commit4d68ffa0f7f345bc1ae6751744518ba4bc3859bd implements a restricted version of what was discussed in the final few messages in the discussion before this message

-- Karl Williamson