Perl / perl5

šŸŖ The Perl programming language
https://dev.perl.org/perl5/
Other
1.9k stars 538 forks source link

Sequence (?#...) not recognized in regex #12879

Closed p5pRT closed 11 years ago

p5pRT commented 11 years ago

Migrated from rt.perl.org#117327 (status was 'resolved')

Searchable as RT117327$

p5pRT commented 11 years ago

From @cpansprout

$ perl5.16.0 -le 'print "ab" =~ /a( ?#foo)b/x' 1 $ ./perl -le 'print "ab" =~ /a( ?#foo)b/x' Sequence (?#...) not recognized in regex; marked by \<-- HERE in m/a( ?# \<-- HERE foo)b/ at -e line 1.

This is the commit that changed it​:

commit 504858073fe16afb61d66a8b6748851780e51432 Author​: Nicholas Clark \nick@&#8203;ccl4\.org Date​: Mon Jan 14 09​:46​:48 2013 +0100

  Remove unreachable duplicate (?#...) parsing code from S_reg()  
  I believe that this code was rendered unreachable when perl 5.001 added   code to S_nextchar() to skip over embedded comments. Adrian Enache noted   this in March 2003\, and proposed a patch which removed it. See   http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-03/msg00840.html  
  The patch wasn't applied at that time\, and when he sent it again August\,   he omitted that hunk. See   http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-08/msg01820.html  
  That version was applied as commit e994fd663a4d8acc.

I suggest reverting it (for consistency with other (?...) constructs) and adding a test. The attached patch does just that.


Flags​:   category=core   severity=low


Site configuration information for perl 5.17.11​:

Configured by sprout at Sat Mar 23 21​:43​:15 PDT 2013.

Summary of my perl5 (revision 5 version 17 subversion 11) configuration​:   Derived from​: 9a51ab7a8994fa31039eb269e6c25ef95aa60af3   Ancestor​: 09af213235c12e35b077ae9eba7defc432f71e99   Platform​:   osname=darwin\, osvers=10.5.0\, archname=darwin-thread-multi-2level   uname='darwin pint.local 10.5.0 darwin kernel version 10.5.0​: fri nov 5 23​:20​:39 pdt 2010; root​:xnu-1504.9.17~1release_i386 i386 '   config_args='-de -DDEBUGGING -Duseithreads -Dusedevel'   hint=recommended\, useposix=true\, d_sigaction=define   useithreads=define\, usemultiplicity=define   useperlio=define\, d_sfio=undef\, uselargefiles=define\, usesocks=undef   use64bitint=undef\, use64bitall=undef\, uselongdouble=undef   usemymalloc=n\, bincompat5005=undef   Compiler​:   cc='cc'\, ccflags ='-fno-common -DPERL_DARWIN -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'\,   optimize='-O3 -g'\,   cppflags='-fno-common -DPERL_DARWIN -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'   ccversion=''\, gccversion='4.2.1 (Apple Inc. build 5664)'\, gccosandvers=''   intsize=4\, longsize=4\, ptrsize=4\, doublesize=8\, byteorder=1234   d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=16   ivtype='long'\, ivsize=4\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=8   alignbytes=8\, prototype=define   Linker and Libraries​:   ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc'\, ldflags =' -fstack-protector -L/usr/local/lib'   libpth=/usr/local/lib /usr/lib   libs=-ldbm -ldl -lm -lutil -lc   perllibs=-ldl -lm -lutil -lc   libc=\, so=dylib\, useshrplib=false\, libperl=libperl.a   gnulibc_version=''   Dynamic Linking​:   dlsrc=dl_dlopen.xs\, dlext=bundle\, d_dlsymun=undef\, ccdlflags=' '   cccdlflags=' '\, lddlflags=' -bundle -undefined dynamic_lookup -L/usr/local/lib -fstack-protector'

Locally applied patches​:  


@​INC for perl 5.17.11​:   lib   /usr/local/lib/perl5/site_perl/5.17.11/darwin-thread-multi-2level   /usr/local/lib/perl5/site_perl/5.17.11   /usr/local/lib/perl5/5.17.11/darwin-thread-multi-2level   /usr/local/lib/perl5/5.17.11   /usr/local/lib/perl5/site_perl   .


Environment for perl 5.17.11​:   DYLD_LIBRARY_PATH (unset)   HOME=/Users/sprout   LANG=en_US.UTF-8   LANGUAGE (unset)   LD_LIBRARY_PATH (unset)   LOGDIR (unset)   PATH=/usr/bin​:/bin​:/usr/sbin​:/sbin​:/usr/local/bin​:/usr/X11/bin​:/usr/local/bin   PERL_BADLANG (unset)   SHELL=/bin/bash

p5pRT commented 11 years ago

From @cpansprout

From 7e59de351d949f027280cc6ded47ee16935dca8b Mon Sep 17 00​:00​:00 2001 From​: Father Chrysostomos \sprout@&#8203;cpan\.org Date​: Sun\, 24 Mar 2013 11​:40​:17 -0700 Subject​: [PATCH] Revert "Remove unreachable duplicate (?#...) parsing code from S_reg()" MIME-Version​: 1.0 Content-Type​: text/plain; charset=UTF-8 Content-Transfer-Encoding​: 8bit

This reverts commit 504858073fe16afb61d66a8b6748851780e51432.

This commit reverts a piece of *reachable* code\, which is necessary for consistency.

A little-known fact is that /x allows (?...) constructs to have a space in front of the question mark​:

$ perl -le 'print "ab" =~ /a( ?​:)b/'

$ perl -le 'print "ab" =~ /a( ?​:)b/x' 1 $ perl -le 'print "ab" =~ /a( ?#foo)b/'

$ perl -le 'print "ab" =~ /a( ?#foo)b/x' 1

There is no reason (?#foo) should be treated differently from the others\, so we must retain its treatment along with all the other con- structs in the code reached by ā€˜( ?ā€™.

This commit also adds a test.

Inline Patch ```diff diff --git a/regcomp.c b/regcomp.c index 6686d8b..7b45c88 100644 --- a/regcomp.c +++ b/regcomp.c @@ -8832,6 +8832,14 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) case '@': /* (?@...) */ vFAIL2("Sequence (?%c...) not implemented", (int)paren); break; + case '#': /* (?#...) */ + while (*RExC_parse && *RExC_parse != ')') + RExC_parse++; + if (*RExC_parse != ')') + FAIL("Sequence (?#... not terminated"); + nextchar(pRExC_state); + *flagp = TRYAGAIN; + return NULL; case '0' : /* (?0) */ case 'R' : /* (?R) */ if (*RExC_parse != ')') diff --git a/t/re/re_tests b/t/re/re_tests index 7e7fc85..927c394 100644 --- a/t/re/re_tests +++ b/t/re/re_tests @@ -546,6 +546,7 @@ a(?{$::bl="\{"}).b caxbd y $::bl { x(~~)*(?:(?:F)?)? x~~ y - - ^a(?#xxx){3}c aaac y $& aaac '^a (?#xxx) (?#yyy) {3}c'x aaac y $& aaac +'a( ?#foo)b'x ab y $& ab (?
p5pRT commented 11 years ago

From @demerphq

On 24 March 2013 22​:01\, Father Chrysostomos \perlbug\-followup@&#8203;perl\.org wrote​:

# New Ticket Created by Father Chrysostomos # Please include the string​: [perl #117327] # in the subject line of all future correspondence about this issue. # \<URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=117327 >

$ perl5.16.0 -le 'print "ab" =~ /a( ?#foo)b/x' 1 $ ./perl -le 'print "ab" =~ /a( ?#foo)b/x' Sequence (?#...) not recognized in regex; marked by \<-- HERE in m/a( ?# \<-- HERE foo)b/ at -e line 1.

This is the commit that changed it​:

commit 504858073fe16afb61d66a8b6748851780e51432 Author​: Nicholas Clark \nick@&#8203;ccl4\.org Date​: Mon Jan 14 09​:46​:48 2013 +0100

Remove unreachable duplicate \(?\#\.\.\.\) parsing code from S\_reg\(\)

I believe that this code was rendered unreachable when perl 5\.001 added
code to S\_nextchar\(\) to skip over embedded comments\. Adrian Enache noted
this in March 2003\, and proposed a patch which removed it\. See
http&#8203;://www\.xray\.mpe\.mpg\.de/mailing\-lists/perl5\-porters/2003\-03/msg00840\.html

The patch wasn't applied at that time\, and when he sent it again August\,
he omitted that hunk\. See
http&#8203;://www\.xray\.mpe\.mpg\.de/mailing\-lists/perl5\-porters/2003\-08/msg01820\.html

That version was applied as commit e994fd663a4d8acc\.

I suggest reverting it (for consistency with other (?...) constructs) and adding a test. The attached patch does just that.

We decided that this interpretation was inconsistent and undocumented. Sometimes it would allow a space sometimes not. We decided it should never allow a space.

So no\, this should not be reverted. What we will and I thought had already done is remove any support for spaces inside of regex meta-sequences.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 11 years ago

From @cpansprout

On Sun Mar 24 23​:44​:35 2013\, demerphq wrote​:

On 24 March 2013 22​:01\, Father Chrysostomos \<perlbug- followup@​perl.org> wrote​:

# New Ticket Created by Father Chrysostomos # Please include the string​: [perl #117327] # in the subject line of all future correspondence about this issue. # \<URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=117327 >

$ perl5.16.0 -le 'print "ab" =~ /a( ?#foo)b/x' 1 $ ./perl -le 'print "ab" =~ /a( ?#foo)b/x' Sequence (?#...) not recognized in regex; marked by \<-- HERE in m/a( ?# \<-- HERE foo)b/ at -e line 1.

This is the commit that changed it​:

commit 504858073fe16afb61d66a8b6748851780e51432 Author​: Nicholas Clark \nick@&#8203;ccl4\.org Date​: Mon Jan 14 09​:46​:48 2013 +0100

Remove unreachable duplicate \(?\#\.\.\.\) parsing code from S\_reg\(\)

I believe that this code was rendered unreachable when perl

5.001 added code to S_nextchar() to skip over embedded comments. Adrian Enache noted this in March 2003\, and proposed a patch which removed it. See http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003- 03/msg00840.html

The patch wasn't applied at that time\, and when he sent it again

August\, he omitted that hunk. See http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003- 08/msg01820.html

That version was applied as commit e994fd663a4d8acc\.

I suggest reverting it (for consistency with other (?...) constructs) and adding a test. The attached patch does just that.

We decided that this interpretation was inconsistent and undocumented. Sometimes it would allow a space sometimes not. We decided it should never allow a space.

So no\, this should not be reverted. What we will and I thought had already done is remove any support for spaces inside of regex meta-sequences.

Thatā€™s fine by me (the last sentence). But 504858073 alone is not good\, as it leaves things in a less consistent and more buggy state than before.

Currently in bleadperl we can do this​:

$ ./perl -Ilib -le 'print "ab" =~ /a( ?​:)b/x' 1

But this fails​:

$ ./perl -Ilib -le 'print "ab" =~ /a( ?#)b/x' Sequence (?#...) not recognized in regex; marked by \<-- HERE in m/a( ?# \<-- HERE )b/ at -e line 1.

*Before* we stop accepting a space there\, we need to decide what will happen​:

a) the space is interpreted literally b) we emit an error explaining it is not permitted

Currently with /( ?#)/x we have neither\, but a fallacious error message claiming that (?#...) is not recognized.

--

Father Chrysostomos

p5pRT commented 11 years ago

From [Unknown Contact. See original ticket]

On Sun Mar 24 23​:44​:35 2013\, demerphq wrote​:

On 24 March 2013 22​:01\, Father Chrysostomos \<perlbug- followup@​perl.org> wrote​:

# New Ticket Created by Father Chrysostomos # Please include the string​: [perl #117327] # in the subject line of all future correspondence about this issue. # \<URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=117327 >

$ perl5.16.0 -le 'print "ab" =~ /a( ?#foo)b/x' 1 $ ./perl -le 'print "ab" =~ /a( ?#foo)b/x' Sequence (?#...) not recognized in regex; marked by \<-- HERE in m/a( ?# \<-- HERE foo)b/ at -e line 1.

This is the commit that changed it​:

commit 504858073fe16afb61d66a8b6748851780e51432 Author​: Nicholas Clark \nick@&#8203;ccl4\.org Date​: Mon Jan 14 09​:46​:48 2013 +0100

Remove unreachable duplicate \(?\#\.\.\.\) parsing code from S\_reg\(\)

I believe that this code was rendered unreachable when perl

5.001 added code to S_nextchar() to skip over embedded comments. Adrian Enache noted this in March 2003\, and proposed a patch which removed it. See http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003- 03/msg00840.html

The patch wasn't applied at that time\, and when he sent it again

August\, he omitted that hunk. See http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003- 08/msg01820.html

That version was applied as commit e994fd663a4d8acc\.

I suggest reverting it (for consistency with other (?...) constructs) and adding a test. The attached patch does just that.

We decided that this interpretation was inconsistent and undocumented. Sometimes it would allow a space sometimes not. We decided it should never allow a space.

So no\, this should not be reverted. What we will and I thought had already done is remove any support for spaces inside of regex meta-sequences.

Thatā€™s fine by me (the last sentence). But 504858073 alone is not good\, as it leaves things in a less consistent and more buggy state than before.

Currently in bleadperl we can do this​:

$ ./perl -Ilib -le 'print "ab" =~ /a( ?​:)b/x' 1

But this fails​:

$ ./perl -Ilib -le 'print "ab" =~ /a( ?#)b/x' Sequence (?#...) not recognized in regex; marked by \<-- HERE in m/a( ?# \<-- HERE )b/ at -e line 1.

*Before* we stop accepting a space there\, we need to decide what will happen​:

a) the space is interpreted literally b) we emit an error explaining it is not permitted

Currently with /( ?#)/x we have neither\, but a fallacious error message claiming that (?#...) is not recognized.

--

Father Chrysostomos

p5pRT commented 11 years ago

From @demerphq

On 26 March 2013 14​:16\, Father Chrysostomos via RT \perlbug\-comment@&#8203;perl\.org wrote​:

On Sun Mar 24 23​:44​:35 2013\, demerphq wrote​:

On 24 March 2013 22​:01\, Father Chrysostomos \<perlbug- followup@​perl.org> wrote​:

# New Ticket Created by Father Chrysostomos # Please include the string​: [perl #117327] # in the subject line of all future correspondence about this issue. # \<URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=117327 >

$ perl5.16.0 -le 'print "ab" =~ /a( ?#foo)b/x' 1 $ ./perl -le 'print "ab" =~ /a( ?#foo)b/x' Sequence (?#...) not recognized in regex; marked by \<-- HERE in m/a( ?# \<-- HERE foo)b/ at -e line 1.

This is the commit that changed it​:

commit 504858073fe16afb61d66a8b6748851780e51432 Author​: Nicholas Clark \nick@&#8203;ccl4\.org Date​: Mon Jan 14 09​:46​:48 2013 +0100

Remove unreachable duplicate \(?\#\.\.\.\) parsing code from S\_reg\(\)

I believe that this code was rendered unreachable when perl

5.001 added code to S_nextchar() to skip over embedded comments. Adrian Enache noted this in March 2003\, and proposed a patch which removed it. See http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003- 03/msg00840.html

The patch wasn't applied at that time\, and when he sent it again

August\, he omitted that hunk. See http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003- 08/msg01820.html

That version was applied as commit e994fd663a4d8acc\.

I suggest reverting it (for consistency with other (?...) constructs) and adding a test. The attached patch does just that.

We decided that this interpretation was inconsistent and undocumented. Sometimes it would allow a space sometimes not. We decided it should never allow a space.

So no\, this should not be reverted. What we will and I thought had already done is remove any support for spaces inside of regex meta-sequences.

Thatā€™s fine by me (the last sentence). But 504858073 alone is not good\, as it leaves things in a less consistent and more buggy state than before.

Currently in bleadperl we can do this​:

$ ./perl -Ilib -le 'print "ab" =~ /a( ?​:)b/x' 1

But this fails​:

$ ./perl -Ilib -le 'print "ab" =~ /a( ?#)b/x' Sequence (?#...) not recognized in regex; marked by \<-- HERE in m/a( ?# \<-- HERE )b/ at -e line 1.

*Before* we stop accepting a space there\, we need to decide what will happen​:

a) the space is interpreted literally b) we emit an error explaining it is not permitted

Currently with /( ?#)/x we have neither\, but a fallacious error message claiming that (?#...) is not recognized.

HRM\, I *think* that the plan was to fix this and other related issues in 5.20.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @nwc10

On Tue\, Mar 26\, 2013 at 06​:16​:57AM -0700\, Father Chrysostomos via RT wrote​:

On Sun Mar 24 23​:44​:35 2013\, demerphq wrote​:

On 24 March 2013 22​:01\, Father Chrysostomos \<perlbug- followup@​perl.org> wrote​:

This is the commit that changed it​:

commit 504858073fe16afb61d66a8b6748851780e51432 Author​: Nicholas Clark \nick@&#8203;ccl4\.org Date​: Mon Jan 14 09​:46​:48 2013 +0100

Remove unreachable duplicate \(?\#\.\.\.\) parsing code from S\_reg\(\)

I believe that this code was rendered unreachable when perl

5.001 added code to S_nextchar() to skip over embedded comments. Adrian Enache noted this in March 2003\, and proposed a patch which removed it. See http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003- 03/msg00840.html

The patch wasn't applied at that time\, and when he sent it again

August\, he omitted that hunk. See http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003- 08/msg01820.html

That version was applied as commit e994fd663a4d8acc\.

I suggest reverting it (for consistency with other (?...) constructs) and adding a test. The attached patch does just that.

We decided that this interpretation was inconsistent and undocumented. Sometimes it would allow a space sometimes not. We decided it should never allow a space.

So no\, this should not be reverted. What we will and I thought had already done is remove any support for spaces inside of regex meta-sequences.

That's fine by me (the last sentence). But 504858073 alone is not good\, as it leaves things in a less consistent and more buggy state than before.

Yes\, I agree that we should revert it for now. It wasn't my intent to change behaviour. I was strictly trying to refactor the existing code. The behaviour in question has no tests. Not just no tests for a space in (?#...)\, but no tests for a space in any of the (? constructions.

The behaviour is\, however\, crazily inconsistent\, which is understandable in terms of the implementation. The no space case works as expected

$ perl5.16.0 -le '$_ = "AB"; print /A(?#comment)B/ ? "T" : "f"' T

Adding a space (understandably) requires /x to parse as a comment​:

$ ~/Sandpit/5160/bin/perl5.16.0 -le '$_ = "AB"; print /A( ?#comment)B/ ? "T" : "f"' f $ ~/Sandpit/5160/bin/perl5.16.0 -le '$_ = "AB"; print /A( ?#comment)B/x ? "T" : "f"' T

You can put a comment in your comment​:

perl5.16.0 -le '$_ = "AB"; print /A( (?#nested comment) ?#comment)B/x ? "T" : "f"' T

But you can't put any spaces or comments in the nested comment​:

$ perl5.16.0 -le '$_ = "AB"; print /A( ( ?#nested comment) ?#comment)B/x ? "T" : "f"' Quantifier follows nothing in regex; marked by \<-- HERE in m/A( ( ?#nested comment) ? \<-- HERE #comment)B/ at -e line 1.

$ perl5.16.0 -le '$_ = "AB"; print /A( ((?#)?#nested comment) ?#comment)B/x ? "T" : "f"'

because (I infer) that inner comment is being parsed by different code from the outer comment

So\, I feel that the entire use of whitespace and comments between ( and ? is something that was never intended to work. It wasn't documented\, and it wasn't tested for. It's emergent behaviour\, rather than an intended feature\, and as I think that it's confusing and inconsistent\, I think that it should go.

But not until 5.19.something\, and consistently. So for now\, I think it needs to go back.

Nicholas Clark

p5pRT commented 11 years ago

From @demerphq

On 10 April 2013 16​:48\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

On Tue\, Mar 26\, 2013 at 06​:16​:57AM -0700\, Father Chrysostomos via RT wrote​:

On Sun Mar 24 23​:44​:35 2013\, demerphq wrote​:

On 24 March 2013 22​:01\, Father Chrysostomos \<perlbug- followup@​perl.org> wrote​:

This is the commit that changed it​:

commit 504858073fe16afb61d66a8b6748851780e51432 Author​: Nicholas Clark \nick@&#8203;ccl4\.org Date​: Mon Jan 14 09​:46​:48 2013 +0100

Remove unreachable duplicate \(?\#\.\.\.\) parsing code from S\_reg\(\)

I believe that this code was rendered unreachable when perl

5.001 added code to S_nextchar() to skip over embedded comments. Adrian Enache noted this in March 2003\, and proposed a patch which removed it. See http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003- 03/msg00840.html

The patch wasn't applied at that time\, and when he sent it again

August\, he omitted that hunk. See http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003- 08/msg01820.html

That version was applied as commit e994fd663a4d8acc\.

I suggest reverting it (for consistency with other (?...) constructs) and adding a test. The attached patch does just that.

We decided that this interpretation was inconsistent and undocumented. Sometimes it would allow a space sometimes not. We decided it should never allow a space.

So no\, this should not be reverted. What we will and I thought had already done is remove any support for spaces inside of regex meta-sequences.

That's fine by me (the last sentence). But 504858073 alone is not good\, as it leaves things in a less consistent and more buggy state than before.

Yes\, I agree that we should revert it for now. It wasn't my intent to change behaviour. I was strictly trying to refactor the existing code. The behaviour in question has no tests. Not just no tests for a space in (?#...)\, but no tests for a space in any of the (? constructions.

That is almost certainly because they aren't supposed to be supported. That they worked at all after a paren is totally an implementation fluke.

The behaviour is\, however\, crazily inconsistent\, which is understandable in terms of the implementation. The no space case works as expected

$ perl5.16.0 -le '$_ = "AB"; print /A(?#comment)B/ ? "T" : "f"' T

Adding a space (understandably) requires /x to parse as a comment​:

$ ~/Sandpit/5160/bin/perl5.16.0 -le '$_ = "AB"; print /A( ?#comment)B/ ? "T" : "f"' f $ ~/Sandpit/5160/bin/perl5.16.0 -le '$_ = "AB"; print /A( ?#comment)B/x ? "T" : "f"' T

You can put a comment in your comment​:

perl5.16.0 -le '$_ = "AB"; print /A( (?#nested comment) ?#comment)B/x ? "T" : "f"' T

But you can't put any spaces or comments in the nested comment​:

$ perl5.16.0 -le '$_ = "AB"; print /A( ( ?#nested comment) ?#comment)B/x ? "T" : "f"' Quantifier follows nothing in regex; marked by \<-- HERE in m/A( ( ?#nested comment) ? \<-- HERE #comment)B/ at -e line 1.

$ perl5.16.0 -le '$_ = "AB"; print /A( ((?#)?#nested comment) ?#comment)B/x ? "T" : "f"'

because (I infer) that inner comment is being parsed by different code from the outer comment

So\, I feel that the entire use of whitespace and comments between ( and ? is something that was never intended to work. It wasn't documented\, and it wasn't tested for. It's emergent behaviour\, rather than an intended feature\, and as I think that it's confusing and inconsistent\, I think that it should go.

But not until 5.19.something\, and consistently. So for now\, I think it needs to go back.

IMO we should just fix the error message.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @rjbs

On Wed Apr 10 07​:56​:32 2013\, demerphq wrote​:

IMO we should just fix the error message.

I am in favor of fixing the erroneous error\, whether that's by reverting the change or by fixing the error. All things being equal\, I'd see the error message fixed.

-- rjbs

p5pRT commented 11 years ago

From @khwilliamson

Attached is a patch for review that fails with a more appropriate error message. As you can see in the commit message\, the scope of the problem is somewhat larger than previously thought.

-- Karl Williamson

p5pRT commented 11 years ago

From @khwilliamson

0001-XXX-finish-up-PATCH-perl-117327-Sequence-.-not-recog.patch ```diff From 196db6182f601e789ae4d8c65219d7d25326b283 Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Tue, 23 Apr 2013 13:39:35 -0600 Subject: [PATCH] XXX finish up: PATCH [perl #117327] Sequence (?#...) not recognized This commit requires regular expression forms '(*VERB:ARG)' and '(?...)' to not have space/comments between the '(' and the next character. Previously, things like: qr/((?# This is a comment in the middle of a token)?:foo)/ were accepted. And it led to confusing error messages, as in ./perl -Ilib -le 'print "ab" =~ /a( ?#)b/x' Sequence (?#...) not recognized in regex; marked by <-- HERE in m/a( ?# <-- HERE )b/ at -e line 1. The problem is that during regex parsing, the '(' is seen, and the input pointer advanced, skipping comments and, under /x, white space, without regard to whether the left parenthesis is part of a bigger token or not. S_reg() handles the parsing of what comes next in the input, and it just assumes that the first character it sees is the character immediately following the parenthesis. Since the parenthesis may or may not be a part of a bigger token, I don't see an easy clean way to fix this. What I did is flag the single call to S_reg() where this could be an issue, and have it check for for adjacency if the parenthesis is part of a bigger token, and if so, fail if not-adjacent. XXX This patch for review is lacking perldiag.pod and tests. I'm open to better phrasing of the error messages. --- pod/perlre.pod | 4 ++-- regcomp.c | 25 ++++++++++++++++++++++--- 2 files changed, 24 insertions(+), 5 deletions(-) diff --git a/pod/perlre.pod b/pod/perlre.pod index e4a0b11..ceba169 100644 --- a/pod/perlre.pod +++ b/pod/perlre.pod @@ -138,8 +138,8 @@ a C<\Q...\E> stays unaffected by C. And note that C doesn't affect space interpretation within a single multi-character construct. For example in C<\x{...}>, regardless of the C modifier, there can be no spaces. Same for a L such as C<{3}> or -C<{5,}>. Similarly, C<(?:...)> can't have a space between the C and C<:>, -but can between the C<(> and C. Within any delimiters for such a +C<{5,}>. Similarly, C<(?:...)> can't have a space between the C<(>, +C, and C<:>. Within any delimiters for such a construct, allowed spaces are not affected by C, and depend on the construct. For example, C<\x{...}> can't have spaces because hexadecimal numbers don't have spaces in them. But, Unicode properties can have spaces, so diff --git a/regcomp.c b/regcomp.c index 0840778..a354f6f 100644 --- a/regcomp.c +++ b/regcomp.c @@ -8601,7 +8601,10 @@ S_parse_lparen_question_flags(pTHX_ struct RExC_state_t *pRExC_state) cannot happen. */ STATIC regnode * S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) - /* paren: Parenthesized? 0=top, 1=(, inside: changed to letter. */ + /* paren: Parenthesized? 0=top; 1,2=inside '(': changed to letter. + * 2 is like 1, but indicates that nextchar() has been called to advance + * RExC_parse beyond the '('. Things like '(?' are indivisible tokens, and + * this flag alerts us to the need to check for that */ { dVAR; regnode *ret; /* Will be the head of the group. */ @@ -8629,6 +8632,13 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) /* Make an OPEN node, if parenthesized. */ if (paren) { + + /* Under /x, space and comments can be gobbled up between the '(' and + * here (if paren ==2). The forms '(*VERB' and '(?...' disallow such + * intervening space, as the sequence is a token, and a token should be + * indivisible */ + bool intervening_patws = paren == 2 && *(RExC_parse - 1) != '('; + if ( *RExC_parse == '*') { /* (*VERB:ARG) */ char *start_verb = RExC_parse; STRLEN verb_len = 0; @@ -8636,6 +8646,11 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) unsigned char op = 0; int argok = 1; int internal_argval = 0; /* internal_argval is only useful if !argok */ + + if (intervening_patws) { + RExC_parse++; + vFAIL("In '(*VERB:ARG)', the initial '(*' characters must be adjacent"); + } while ( *RExC_parse && *RExC_parse != ')' ) { if ( *RExC_parse == ':' ) { start_arg = RExC_parse + 1; @@ -8737,6 +8752,10 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) if (*RExC_parse == '?') { /* (?...) */ bool is_logical = 0; const char * const seqstart = RExC_parse; + if (intervening_patws) { + RExC_parse++; + vFAIL("In '(?...)', the initial '(?' characters must be adjacent"); + } RExC_parse++; paren = *RExC_parse++; @@ -9310,7 +9329,7 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) case ':': ender = reg_node(pRExC_state, TAIL); break; - case 1: + case 1: case 2: ender = reganode(pRExC_state, CLOSE, parno); if (!SIZE_ONLY && RExC_seen & REG_SEEN_RECURSE) { DEBUG_OPTIMISE_MORE_r(PerlIO_printf(Perl_debug_log, @@ -10300,7 +10319,7 @@ tryagain: } case '(': nextchar(pRExC_state); - ret = reg(pRExC_state, 1, &flags,depth+1); + ret = reg(pRExC_state, 2, &flags,depth+1); if (ret == NULL) { if (flags & TRYAGAIN) { if (RExC_parse == RExC_end) { -- 1.8.1.3 ```
p5pRT commented 11 years ago

From @cpansprout

On Tue Apr 23 13​:09​:41 2013\, khw wrote​:

Attached is a patch for review that fails with a more appropriate error message. As you can see in the commit message\, the scope of the problem is somewhat larger than previously thought.

Thank you.

If you are fixing all the (?...) constructs at the same time (which I think you are)\, then I am all for this patch.

What makes me uncomfortable is leaving Perl in an inconsistent state in which *some* (?...) constructs allow the space and some do not.

--

Father Chrysostomos

p5pRT commented 11 years ago

From [Unknown Contact. See original ticket]

On Tue Apr 23 13​:09​:41 2013\, khw wrote​:

Attached is a patch for review that fails with a more appropriate error message. As you can see in the commit message\, the scope of the problem is somewhat larger than previously thought.

Thank you.

If you are fixing all the (?...) constructs at the same time (which I think you are)\, then I am all for this patch.

What makes me uncomfortable is leaving Perl in an inconsistent state in which *some* (?...) constructs allow the space and some do not.

--

Father Chrysostomos

p5pRT commented 11 years ago

From @nwc10

On Sun\, Apr 28\, 2013 at 02​:39​:52PM -0700\, Father Chrysostomos via RT wrote​:

On Tue Apr 23 13​:09​:41 2013\, khw wrote​:

Attached is a patch for review that fails with a more appropriate error message. As you can see in the commit message\, the scope of the problem is somewhat larger than previously thought.

Thank you.

If you are fixing all the (?...) constructs at the same time (which I think you are)\, then I am all for this patch.

What makes me uncomfortable is leaving Perl in an inconsistent state in which *some* (?...) constructs allow the space and some do not.

I agree - it's better to be consistent\, even if it's a consistently ugly state. (and the same ugly as the previous releases.)

I'm not at all familiar with that code\, so I can't say with any confidence whether this change gets all of them\, or leaves some uncovered. I hope that someone else is confident to assess this.

Nicholas Clark

p5pRT commented 11 years ago

From @khwilliamson

On 04/29/2013 08​:49 AM\, Nicholas Clark wrote​:

On Sun\, Apr 28\, 2013 at 02​:39​:52PM -0700\, Father Chrysostomos via RT wrote​:

On Tue Apr 23 13​:09​:41 2013\, khw wrote​:

Attached is a patch for review that fails with a more appropriate error message. As you can see in the commit message\, the scope of the problem is somewhat larger than previously thought.

Thank you.

If you are fixing all the (?...) constructs at the same time (which I think you are)\, then I am all for this patch.

What makes me uncomfortable is leaving Perl in an inconsistent state in which *some* (?...) constructs allow the space and some do not.

I agree - it's better to be consistent\, even if it's a consistently ugly state. (and the same ugly as the previous releases.)

I'm not at all familiar with that code\, so I can't say with any confidence whether this change gets all of them\, or leaves some uncovered. I hope that someone else is confident to assess this.

Nicholas Clark

I am confident that it catches all of them\, and also (*VERB...) constructs\, which it turns out have the same issue; otherwise I wouldn't have submitted the patch :)

I'm also confident that this patch only affects the intended things. And I can make a detailed case for that if necessary.

And I am confident that something like this patch is the right thing to do going forward. That a (?#...) comment currently can split the (? and (* tokens without having to have /x makes this more clear.

What I am unsure of is if we should\, so close to 5.18\, be making something a syntax error that previously wasn't. There is one place in our code\, utf8_heavy.pl\, which has '( ?'\, and must be corrected before this patch is applied. I wrote that offending code\, so I know that the intervening space was just an unintended typo that never got noticed.

Even making this a deprecation warning instead of a syntax error can cause code to fail.

If we don't do anything\, 5.18 will have a regression misleading error message. If we take this patch\, but make it a warning instead of an error\, 5.18 will have the misleading error\, plus this warning ahead of it. I suppose we could tweak the patch so that it gives an error in precisely the same places as the misleading error would\, and lets all the other problematic constructs go by\, unwarned on\, and then change it in 5.19 to warn. I haven't investigated how hard this would be.

The other alternative I can think of is to revert the patch that caused this regression. I haven't investigated enough to understand the implications of that.

p5pRT commented 11 years ago

From @rjbs

On Mon Apr 29 11​:13​:31 2013\, public@​khwilliamson.com wrote​:

I am confident ā€¦

Your confidence fills me with contentment!

What I am unsure of is if we should\, so close to 5.18\, be making something a syntax error that previously wasn't.

I have thought about this.

Right now\, we made an inconsistent behavior stay inconsistent\, but different and\, I think\, worse.

I considered these primary options​:

(a) the "just for-Heaven's-sake do it" option

Here\, we apply Karl's patch. This eliminates inconsistency now\, but adds more fatal cases. I doubt anyone is strongly opposed to this changeā€¦ except that it would be getting applied during the big scary 5.18.0 code freeze\, and we have to consider that if "real code" is relying on the behavior of /( ?​: abc)/ it has a greater chance of being untested and fixed "in time" if the new error is first found (other than blead) in 5.18.0-RC1.

On the other hand\, it closes the case on this bug\, and it is quite tempting to believe that if any code is relying on this very sketchy area of the regex system\, it's already a grotesque mess\, and such code probably doesn't even *exist* anyway. Unfortunately\, such code clearly does exist​:

http​://grep.cpan.me/?q=%5C%28%5Cs%2B%5C%3F%5B%21%3A%3E%5D

As someone who loathes "just put more whitespace anywhere you like" grammars\, I was sad to see my name in that list. ;-) It looks to me like "compressed regex nonsense" is likely to be a factor here\, which means "stuff breaking that nobody knows why or how or what to fix."

(b) the "don't break backcompat" option

Here\, we revert the original change\, with a firm plan to fix it thoroughly in the future. We need to issue a notice that the current behavior is going to be corrected\, so we do that now. Then in 5.19.0\, we strictly enforce the error\, going to option (a). We retain the "old weird\," but add a warning\, which has some chance of breaking code. (Most likely\, the cynic in me supposes\, by breaking "no warnings" tests\, while the code that would've been effected is itself untested.)

(c) the "all things in moderation" option

We leave things exactly as they are in blead\, post-50485807\, but alter the emitted error to say "you can't put space where you put it." This puts any likely breakage's starting point back in January (before the user-visible change freeze)\, and we move to option (a) in 5.19.0. There are two down sides\, here​: first\, we're adding a new syntax error where there wasn't one without warning; secondly\, we're introducing inconsistency. The inconsistency doesn't bother me *as long as we fix it ASAP*. This is like "we fixed a bunch of parts of a bug\, and will continue to fix its other parts." This kind of half-fix is something I am expressly okay with\, as we're doing timeboxed releases.

That said\, what's the win of (c) over (b)? I'm not sure. I don't think we really lose any momentum on the problem\, nor do we return to some terrible state.

SO​:

My original "all things equal\, just fix the message" had in mind something like (c)​: Just change the text we emit so that we can release 5.18.0 and then fix things better. Now it seems clear to me that we need a deprecation period for the bogus-space problem. (Specifically\, I'm thinking of code like that in the cpan grep\, above.)

I think option (a) is right out for 5.18.0.

Karl also wrote​:

I suppose we could tweak the patch so that it gives an error in precisely the same places as the misleading error would\, and lets all the other problematic constructs go by\, unwarned on\, and then change it in 5.19 to warn. I haven't investigated how hard this would be.

Karl (or anybody) what is the difficulty in basically keeping the old behavior but issuing a warning on the problematic constructs *right away*\, to be made fatal in 5.19.0-ish? That is​:
can we enact option (b) in a timely way?

-- rjbs

p5pRT commented 11 years ago

From @cpansprout

On Tue Apr 30 16​:09​:35 2013\, rjbs wrote​:

(c) the "all things in moderation" option

We leave things exactly as they are in blead\, post-50485807\, but alter the emitted error to say "you can't put space where you put it." This puts any likely breakage's starting point back in January (before the user-visible change freeze)\, and we move to option (a) in 5.19.0. There are two down sides\, here​: first\, we're adding a new syntax error where there wasn't one without warning; secondly\, we're introducing inconsistency. The inconsistency doesn't bother me *as long as we fix it ASAP*. This is like "we fixed a bunch of parts of a bug\, and will continue to fix its other parts." This kind of half-fix is something I am expressly okay with\, as we're doing timeboxed releases.

This is the only option I really donā€™t like. I have a module that converts between flavours of regular expressions\, and tests to make sure that Perl-specific extensions are passed through unchanged. (These tests are apparently more exhaustive than Perlā€™s own test suite. :-) I am not opposed to tweaking the tests to expect different results if $]

= 5.017 or 5.019. But having to have different version checks for different constructs in the same category (instead of a single if block) is rather annoying\, donā€™t you think?

--

Father Chrysostomos

p5pRT commented 11 years ago

From @khwilliamson

On 04/30/2013 05​:09 PM\, Ricardo SIGNES via RT wrote​:

On Mon Apr 29 11​:13​:31 2013\, public@​khwilliamson.com wrote​:

I am confident ā€¦

Your confidence fills me with contentment!

What I am unsure of is if we should\, so close to 5.18\, be making something a syntax error that previously wasn't.

I have thought about this.

Right now\, we made an inconsistent behavior stay inconsistent\, but different and\, I think\, worse.

I considered these primary options​:

(a) the "just for-Heaven's-sake do it" option

Here\, we apply Karl's patch. This eliminates inconsistency now\, but adds more fatal cases. I doubt anyone is strongly opposed to this changeā€¦ except that it would be getting applied during the big scary 5.18.0 code freeze\, and we have to consider that if "real code" is relying on the behavior of /( ?​: abc)/ it has a greater chance of being untested and fixed "in time" if the new error is first found (other than blead) in 5.18.0-RC1.

On the other hand\, it closes the case on this bug\, and it is quite tempting to believe that if any code is relying on this very sketchy area of the regex system\, it's already a grotesque mess\, and such code probably doesn't even *exist* anyway. Unfortunately\, such code clearly does exist​:

http​://grep.cpan.me/?q=%5C%28%5Cs%2B%5C%3F%5B%21%3A%3E%5D

As someone who loathes "just put more whitespace anywhere you like" grammars\, I was sad to see my name in that list. ;-) It looks to me like "compressed regex nonsense" is likely to be a factor here\, which means "stuff breaking that nobody knows why or how or what to fix."

(b) the "don't break backcompat" option

Here\, we revert the original change\, with a firm plan to fix it thoroughly in the future. We need to issue a notice that the current behavior is going to be corrected\, so we do that now. Then in 5.19.0\, we strictly enforce the error\, going to option (a). We retain the "old weird\," but add a warning\, which has some chance of breaking code. (Most likely\, the cynic in me supposes\, by breaking "no warnings" tests\, while the code that would've been effected is itself untested.)

(c) the "all things in moderation" option

We leave things exactly as they are in blead\, post-50485807\, but alter the emitted error to say "you can't put space where you put it." This puts any likely breakage's starting point back in January (before the user-visible change freeze)\, and we move to option (a) in 5.19.0. There are two down sides\, here​: first\, we're adding a new syntax error where there wasn't one without warning; secondly\, we're introducing inconsistency. The inconsistency doesn't bother me *as long as we fix it ASAP*. This is like "we fixed a bunch of parts of a bug\, and will continue to fix its other parts." This kind of half-fix is something I am expressly okay with\, as we're doing timeboxed releases.

That said\, what's the win of (c) over (b)? I'm not sure. I don't think we really lose any momentum on the problem\, nor do we return to some terrible state.

SO​:

My original "all things equal\, just fix the message" had in mind something like (c)​: Just change the text we emit so that we can release 5.18.0 and then fix things better. Now it seems clear to me that we need a deprecation period for the bogus-space problem. (Specifically\, I'm thinking of code like that in the cpan grep\, above.)

I think option (a) is right out for 5.18.0.

I think you meant s/right/ruled/

Karl also wrote​:

I suppose we could tweak the patch so that it gives an error in precisely the same places as the misleading error would\, and lets all the other problematic constructs go by\, unwarned on\, and then change it in 5.19 to warn. I haven't investigated how hard this would be.

Karl (or anybody) what is the difficulty in basically keeping the old behavior but issuing a warning on the problematic constructs *right away*\, to be made fatal in 5.19.0-ish? That is​: can we enact option (b) in a timely way?

My original intent was to change the error message\, but what I ended up with was the simplest thing to implement. I discovered that you could legally have​:

qr/((?#   Thousand-   line   comment   ...   ...   )?​:foo)/

That seemed to me to indicate a real issue in the need of fixing.

I don't fully understand your final suggestion. So what I think we should do may be what you are saying. I think we should not apply my patch\, but revert 504858073fe16afb61d66a8b6748851780e51432\, Nicholas's patch that this bug bisects to\, where he mistakenly thought the code was dead\, but it wasn't.

If it is deemed ok to add a warning this late for 5.18\, then my patch can trivially be modified to warn and not fail. We can make that warning either default on or default off.

p5pRT commented 11 years ago

From @rjbs

* Karl Williamson \public@&#8203;khwilliamson\.com [2013-04-30T21​:30​:13]

On 04/30/2013 05​:09 PM\, Ricardo SIGNES via RT wrote​:

I think option (a) is right out for 5.18.0.

I think you meant s/right/ruled/

Perhaps that is a colloquialism. It's "right out" like "we'll chuck this one right out the door." Either way\, though\, the meaning is the same.

* Karl Williamson \public@&#8203;khwilliamson\.com [2013-04-30T21​:30​:13]

My original intent was to change the error message\, but what I ended up with was the simplest thing to implement. I discovered that you could legally have​: [...] That seemed to me to indicate a real issue in the need of fixing.

I agree that there's a lot of real problem here to be fixed\, and I think your fix is a good fix for the future.

I don't fully understand your final suggestion. So what I think we should do may be what you are saying. I think we should not apply my patch\, but revert 504858073fe16afb61d66a8b6748851780e51432\, Nicholas's patch that this bug bisects to\, where he mistakenly thought the code was dead\, but it wasn't.

If it is deemed ok to add a warning this late for 5.18\, then my patch can trivially be modified to warn and not fail. We can make that warning either default on or default off.

I think we should do both of these things\, and that the warning should be on by default\, as it prefigures a coming fatal error. This is what I meant to suggest

-- rjbs

p5pRT commented 11 years ago

From @tamias

On Tue\, Apr 30\, 2013 at 04​:09​:36PM -0700\, Ricardo SIGNES via RT wrote​:

On the other hand\, it closes the case on this bug\, and it is quite tempting to believe that if any code is relying on this very sketchy area of the regex system\, it's already a grotesque mess\, and such code probably doesn't even *exist* anyway. Unfortunately\, such code clearly does exist​: http​://grep.cpan.me/?q=%5C%28%5Cs%2B%5C%3F%5B%21%3A%3E%5D

As someone who loathes "just put more whitespace anywhere you like" grammars\, I was sad to see my name in that list. ;-) It looks to me like "compressed regex nonsense" is likely to be a factor here\, which means "stuff breaking that nobody knows why or how or what to fix."

FWIW\, Net​::Whois​::Parser\, Net​::Domain​::Info​::Whois\, and Email​::Valid all matched your search due to the following code​:

$RFC822PAT = \<\<'EOF'; [\040\t]*(?​:\([^\\\x80-\xff\n\015()]*(?​:(?​:\\[^\x80-\xff]|\([^\\\x80-\ xff\n\015()]*(?​:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xf ... ff\n\015()]*)*\)[\040\t]*)*(?​:\.[\040\t]*(?​:\([^\\\x80-\xff\n\015()]*( ?​:(?​:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?​:\\[^\x80-\xff][^\\\x80 ... EOF

$RFC822PAT =~ s/\n//g;

The whitespace is removed from the pattern before use\, so these modules would not actually be affected by this change.

Several of the other modules you found would be affected\, however.

Ronald

p5pRT commented 11 years ago

From @demerphq

On 1 May 2013 02​:27\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Tue Apr 30 16​:09​:35 2013\, rjbs wrote​:

(c) the "all things in moderation" option

We leave things exactly as they are in blead\, post-50485807\, but alter the emitted error to say "you can't put space where you put it." This puts any likely breakage's starting point back in January (before the user-visible change freeze)\, and we move to option (a) in 5.19.0. There are two down sides\, here​: first\, we're adding a new syntax error where there wasn't one without warning; secondly\, we're introducing inconsistency. The inconsistency doesn't bother me *as long as we fix it ASAP*. This is like "we fixed a bunch of parts of a bug\, and will continue to fix its other parts." This kind of half-fix is something I am expressly okay with\, as we're doing timeboxed releases.

This is the only option I really donā€™t like. I have a module that converts between flavours of regular expressions\, and tests to make sure that Perl-specific extensions are passed through unchanged. (These tests are apparently more exhaustive than Perlā€™s own test suite. :-)

Maybe you should contribute some test files to core.

But do consider that you might be testing for behavior that I would consider to be a bug - IOW\, behavior I would not write a test for because I don't consider it to be correct in the first place. For instance I dont consider it a bug that we have no tests to make sure /((?#wtf)?​:foo)/ works as I don't think it should work\, so for me if there is a missing test it is one that tests that /((?#wtf)?​:foo)/ throws an error.

Yves

p5pRT commented 11 years ago

From @khwilliamson

On 04/30/2013 07​:47 PM\, Ricardo Signes wrote​:

So what I think we

should do may be what you are saying. I think we should not apply my patch\, but revert 504858073fe16afb61d66a8b6748851780e51432\, Nicholas's patch that this bug bisects to\, where he mistakenly thought the code was dead\, but it wasn't.

If it is deemed ok to add a warning this late for 5.18\, then my patch can trivially be modified to warn and not fail. We can make that warning either default on or default off. I think we should do both of these things\, and that the warning should be on by default\, as it prefigures a coming fatal error. This is what I meant to suggest

Testing this in smoke-me/khw-5.18

p5pRT commented 11 years ago

From @khwilliamson

Fixed as commit 182b0c3f66c6c5a93e9dab9bf0162091942bb5f9 which reverts the patch that caused the problem\, adding a test and comments.

Commit 000947ada5b027f394ee63b5166df8c06b64a74e deprecates splitting appart the (? and (* tokens.

-- Karl Williamson

p5pRT commented 11 years ago

From [Unknown Contact. See original ticket]

Fixed as commit 182b0c3f66c6c5a93e9dab9bf0162091942bb5f9 which reverts the patch that caused the problem\, adding a test and comments.

Commit 000947ada5b027f394ee63b5166df8c06b64a74e deprecates splitting appart the (? and (* tokens.

-- Karl Williamson

p5pRT commented 11 years ago

@khwilliamson - Status changed from 'open' to 'resolved'

p5pRT commented 11 years ago

From @khwilliamson

On 05/02/2013 01​:44 PM\, Karl Williamson via RT wrote​:

Fixed as commit 182b0c3f66c6c5a93e9dab9bf0162091942bb5f9 which reverts the patch that caused the problem\, adding a test and comments.

Commit 000947ada5b027f394ee63b5166df8c06b64a74e deprecates splitting appart the (? and (* tokens.

I forgot to mention that this needs a perldelta entry. I did not add one\, since the blead perldelta is not current.

p5pRT commented 11 years ago

From @rjbs

* Karl Williamson \public@&#8203;khwilliamson\.com [2013-05-02T15​:53​:39]

I forgot to mention that this needs a perldelta entry. I did not add one\, since the blead perldelta is not current.

Thanks very much\, Karl. I'll be merging my perldelta branch today or tomorrow. (I've just gotten back from out of town.)

-- rjbs

p5pRT commented 11 years ago

From @cpansprout

On Wed May 01 01​:05​:51 2013\, demerphq wrote​:

On 1 May 2013 02​:27\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Tue Apr 30 16​:09​:35 2013\, rjbs wrote​:

(c) the "all things in moderation" option

We leave things exactly as they are in blead\, post-50485807\, but alter the emitted error to say "you can't put space where you put it." This puts any likely breakage's starting point back in January (before the user-visible change freeze)\, and we move to option (a) in 5.19.0. There are two down sides\, here​: first\, we're adding a new syntax error where there wasn't one without warning; secondly\, we're introducing inconsistency. The inconsistency doesn't bother me *as long as we fix it ASAP*. This is like "we fixed a bunch of parts of a bug\, and will continue to fix its other parts." This kind of half-fix is something I am expressly okay with\, as we're doing timeboxed releases.

This is the only option I really donā€™t like. I have a module that converts between flavours of regular expressions\, and tests to make sure that Perl-specific extensions are passed through unchanged. (These tests are apparently more exhaustive than Perlā€™s own test suite. :-)

Maybe you should contribute some test files to core.

I included the smiley because my tests are not that exhaustive\, just two tests for each (?...) construct to make sure my module doesnā€™t screw them up.

But do consider that you might be testing for behavior that I would consider to be a bug - IOW\, behavior I would not write a test for because I don't consider it to be correct in the first place. For instance I dont consider it a bug that we have no tests to make sure /((?#wtf)?​:foo)/ works as I don't think it should work\, so for me if there is a missing test it is one that tests that /((?#wtf)?​:foo)/ throws an error.

I donā€™t disagree. It just didnā€™t occur to me at the time I wrote the tests that allowing ( ?​:foo) was a bug. (I had never thought of ((?#foo)?​:foo).)

--

Father Chrysostomos