Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.93k stars 551 forks source link

Variable length lookbehind is not variable #13435

Closed p5pRT closed 5 years ago

p5pRT commented 10 years ago

Migrated from rt.perl.org#120600 (status was 'resolved')

Searchable as RT120600$

p5pRT commented 10 years ago

From adrianh.bsc@gmail.com

The following code works​:

  #!/usr/bin/perl   ($_) = "abcdef" =~   /   ((?&BB).*)   | (?!)   (?\[abc])   (?\[^abc])   (?\   (?\<=[abc])(?=[^abc])   | (?\<=[^abc])(?=[abc])   )   /x;   print;

The following equivalent code does not​:

  #!/usr/bin/perl   ($_) = "abcdef" =~   /   ((?&BB).*)   |   (?!)   (?\[abc])   (?\[^abc])   (?\   (?\<=(?&W))(?=(?&NW))   |(?\<=(?&NW))(?=(?&W))   )   /x;   print;

Why is the named pattern being treated as a variable length pattern?

Thanks\,

Adrian Hawryluk

p5pRT commented 10 years ago

From @iabyn

On Wed\, Nov 20\, 2013 at 11​:12​:15AM -0800\, Adrian wrote​:

Why is the named pattern being treated as a variable length pattern?

I can reduce the failing code to this​:

qr/   (?\a)   (?\   (?=(?&W))(?\<=(?&W))   )   (?&BB)   /x;

which gives

  Variable length lookbehind not implemented

Changing the line with the two &W's to

  (?=a)(?\<=a)

makes the error go away.

I've had a quick look at study_chunk()\, but it's way beyond my understanding. Does anyone who understands that area want to have a go?

-- Overhead\, without any fuss\, the stars were going out.   -- Arthur C Clarke

p5pRT commented 10 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 10 years ago

From @khwilliamson

On 11/21/2013 08​:02 AM\, Dave Mitchell wrote​:

I've had a quick look at study_chunk()\, but it's way beyond my understanding. Does anyone who understands that area want to have a go?

It's my understanding that nobody understands that area

p5pRT commented 10 years ago

From @demerphq

On 21 November 2013 16​:02\, Dave Mitchell \davem@&#8203;iabyn\.com wrote​:

On Wed\, Nov 20\, 2013 at 11​:12​:15AM -0800\, Adrian wrote​:

Why is the named pattern being treated as a variable length pattern?

I can reduce the failing code to this​:

qr/ (?\a) (?\ (?=(?&W))(?\<=(?&W)) ) (?&BB) /x;

which gives

Variable length lookbehind not implemented

Changing the line with the two &W's to

\(?=a\)\(?\<=a\)

makes the error go away.

I've had a quick look at study_chunk()\, but it's way beyond my understanding. Does anyone who understands that area want to have a go?

All I can say that as implemented its not a bug\, if anything a misfeature. (?&W) is a hairs breadth away from (??{ ... }) and is treated accordingly right now. To fix this we would have to change that.

We would have to analyse the pattern and determine if the thing named (&W) (and there could be more than one) is variable width or not. So we punt and assume it must be variable width.

IMO unless we can very efficiently determine if the sub pattern is fixed width this ticket will end up as a "wont fix".

It definitely isn't a priority for me to investigate this edge case\, although I might at some point.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From @ikegami

On Thu\, Nov 21\, 2013 at 1​:40 PM\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 21 November 2013 16​:02\, Dave Mitchell \davem@&#8203;iabyn\.com wrote​:

On Wed\, Nov 20\, 2013 at 11​:12​:15AM -0800\, Adrian wrote​:

Why is the named pattern being treated as a variable length pattern?

I can reduce the failing code to this​:

qr/ (?\a) (?\ (?=(?&W))(?\<=(?&W)) ) (?&BB) /x;

which gives

Variable length lookbehind not implemented

Changing the line with the two &W's to

\(?=a\)\(?\<=a\)

makes the error go away.

I've had a quick look at study_chunk()\, but it's way beyond my understanding. Does anyone who understands that area want to have a go?

All I can say that as implemented its not a bug\, if anything a misfeature. (?&W) is a hairs breadth away from (??{ ... }) and is treated accordingly right now. To fix this we would have to change that.

That was my guess too\, but that doesn't explain why only one of the following fails​:

$ perl -E'qr/(?\a)(?\(?=(?&W))(?\<=(?&W)))(?&BB)/x; say "ok"' Variable length lookbehind not implemented in regex m/(?\a)(?\(?=(?&W))(?\<=(?&W)))(?&BB)/ at -e line 1.

$ perl -E'qr/(?\a)(?\ (?\<=(?&W)))(?&BB)/x; say "ok"' ok

(I thought we used -E in error messages instead of -e when -E was used. Do I remember incorrectly\, or did that change?)

p5pRT commented 10 years ago

From @iabyn

On Thu\, Nov 21\, 2013 at 07​:40​:31PM +0100\, demerphq wrote​:

On 21 November 2013 16​:02\, Dave Mitchell \davem@&#8203;iabyn\.com wrote​:

On Wed\, Nov 20\, 2013 at 11​:12​:15AM -0800\, Adrian wrote​:

Why is the named pattern being treated as a variable length pattern?

I can reduce the failing code to this​:

qr/ (?\a) (?\ (?=(?&W))(?\<=(?&W)) ) (?&BB) /x;

which gives

Variable length lookbehind not implemented

Changing the line with the two &W's to

\(?=a\)\(?\<=a\)

makes the error go away.

I've had a quick look at study_chunk()\, but it's way beyond my understanding. Does anyone who understands that area want to have a go?

All I can say that as implemented its not a bug\, if anything a misfeature. (?&W) is a hairs breadth away from (??{ ... }) and is treated accordingly right now. To fix this we would have to change that.

We would have to analyse the pattern and determine if the thing named (&W) (and there could be more than one) is variable width or not. So we punt and assume it must be variable width.

Except we often don't punt. If you remove *anything* from the reduced test case above\, it stops warning. i.e. negative has to be preceded by a positive lookbehind\, and wrapped within the \. Remove either\, and it works. If it works sometimes\, that seems to lend more weight to it being a bug.

-- "Procrastination grows to fill the available time"   -- Mitchell's corollary to Parkinson's Law

p5pRT commented 10 years ago

From @demerphq

On 21 November 2013 20​:30\, Dave Mitchell \davem@&#8203;iabyn\.com wrote​:

On Thu\, Nov 21\, 2013 at 07​:40​:31PM +0100\, demerphq wrote​:

On 21 November 2013 16​:02\, Dave Mitchell \davem@&#8203;iabyn\.com wrote​:

On Wed\, Nov 20\, 2013 at 11​:12​:15AM -0800\, Adrian wrote​:

Why is the named pattern being treated as a variable length pattern?

I can reduce the failing code to this​:

qr/ (?\a) (?\ (?=(?&W))(?\<=(?&W)) ) (?&BB) /x;

which gives

Variable length lookbehind not implemented

Changing the line with the two &W's to

\(?=a\)\(?\<=a\)

makes the error go away.

I've had a quick look at study_chunk()\, but it's way beyond my understanding. Does anyone who understands that area want to have a go?

All I can say that as implemented its not a bug\, if anything a misfeature. (?&W) is a hairs breadth away from (??{ ... }) and is treated accordingly right now. To fix this we would have to change that.

We would have to analyse the pattern and determine if the thing named (&W) (and there could be more than one) is variable width or not. So we punt and assume it must be variable width.

Except we often don't punt. If you remove *anything* from the reduced test case above\, it stops warning. i.e. negative has to be preceded by a positive lookbehind\, and wrapped within the \. Remove either\, and it works. If it works sometimes\, that seems to lend more weight to it being a bug.

Gah. Ok. Ill investigate a bit and report my findings.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From zefram@fysh.org

demerphq wrote​:

IMO unless we can very efficiently determine if the sub pattern is fixed width this ticket will end up as a "wont fix".

Runtime efficiency of determining fixedness shouldn't be a concern\, because the tricky case only comes up when explicitly invoked. A slow answer is far better than giving up.

-zefram

p5pRT commented 10 years ago

From @demerphq

On 21 November 2013 20​:30\, Dave Mitchell \davem@&#8203;iabyn\.com wrote​:

On Thu\, Nov 21\, 2013 at 07​:40​:31PM +0100\, demerphq wrote​:

On 21 November 2013 16​:02\, Dave Mitchell \davem@&#8203;iabyn\.com wrote​:

On Wed\, Nov 20\, 2013 at 11​:12​:15AM -0800\, Adrian wrote​:

Why is the named pattern being treated as a variable length pattern?

I can reduce the failing code to this​:

qr/ (?\a) (?\ (?=(?&W))(?\<=(?&W)) ) (?&BB) /x;

which gives

Variable length lookbehind not implemented

Changing the line with the two &W's to

\(?=a\)\(?\<=a\)

makes the error go away.

I've had a quick look at study_chunk()\, but it's way beyond my understanding. Does anyone who understands that area want to have a go?

All I can say that as implemented its not a bug\, if anything a misfeature. (?&W) is a hairs breadth away from (??{ ... }) and is treated accordingly right now. To fix this we would have to change that.

We would have to analyse the pattern and determine if the thing named (&W) (and there could be more than one) is variable width or not. So we punt and assume it must be variable width.

Except we often don't punt. If you remove *anything* from the reduced test case above\, it stops warning. i.e. negative has to be preceded by a positive lookbehind\, and wrapped within the \. Remove either\, and it works. If it works sometimes\, that seems to lend more weight to it being a bug.

commit 099ec7dcf9e085a650e6d9010c12ad9649209bf4 Author​: Yves Orton \demerphq@&#8203;gmail\.com Date​: Fri Nov 22 01​:08​:39 2013 +0100

  Fix RT #120600​: Variable length lookbehind is not variable

  Inside of study_chunk() we have to guard against infinite   recursion with recursive subpatterns. The existing logic   sort of worked\, but didn't address all cases properly.

  qr/   (?\a)   (?\   (?=(?&W))(?\<=(?&W))   )   (?&BB)   /x;

  The pattern in the test would fail when the optimizer   was expanding (&BB). When it recursed\, it creates a bitmap   for the recursion it performs\, it then jumps back to   the BB node and then eventually does the first (&W) call.   At this point the bit for (&W) would be set in the bitmask.   When the recursion for the (&W) exited (fake exit through   the study frame logic) the bit was not /unset/. When the parser   then entered the (&W) again it was treated as a nested and   potentially infinite length pattern.

  The fake-recursion in study-chunk made it little less obvious   what was going on in the debug output.

  By reorganizing the code and adding logic to unset the bitmap   when exiting this bug was fixed. Unfortunately this also revealed   another little issue with patterns like this​:

  qr/x|(?0)/   qr/(x|(?1))/

  which forced the creation of a new bitmask for each branch.   Effectively study_chunk treats each branch as an independent   pattern\, so when we are expanding (?1) via the 'x' branch   we dont want that to prevent us from detecting the infinite recursion   in the (?1) branch. If you were to think of trips through study_chunk   as paths\, and [] as recursive processing you would get something like​:

  BRANCH 'x' END   BRANCH (?0) [ 'x' END ]   BRANCH (?0) [ (?0) [ 'x' END ] ]   ...

  When we want something like​:

  BRANCH 'x' END   BRANCH (?0) [ 'x' END ]   BRANCH (?0) [ (?0) INFINITE_RECURSION ]

  So when we deal with a branch we need to make a new recursion bitmask.

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From @demerphq

On 22 November 2013 01​:32\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 21 November 2013 20​:30\, Dave Mitchell \davem@&#8203;iabyn\.com wrote​:

On Thu\, Nov 21\, 2013 at 07​:40​:31PM +0100\, demerphq wrote​:

On 21 November 2013 16​:02\, Dave Mitchell \davem@&#8203;iabyn\.com wrote​:

On Wed\, Nov 20\, 2013 at 11​:12​:15AM -0800\, Adrian wrote​:

Why is the named pattern being treated as a variable length pattern?

I can reduce the failing code to this​:

qr/ (?\a) (?\ (?=(?&W))(?\<=(?&W)) ) (?&BB) /x;

which gives

Variable length lookbehind not implemented

Changing the line with the two &W's to

\(?=a\)\(?\<=a\)

makes the error go away.

I've had a quick look at study_chunk()\, but it's way beyond my understanding. Does anyone who understands that area want to have a go?

All I can say that as implemented its not a bug\, if anything a misfeature. (?&W) is a hairs breadth away from (??{ ... }) and is treated accordingly right now. To fix this we would have to change that.

We would have to analyse the pattern and determine if the thing named (&W) (and there could be more than one) is variable width or not. So we punt and assume it must be variable width.

Except we often don't punt. If you remove *anything* from the reduced test case above\, it stops warning. i.e. negative has to be preceded by a positive lookbehind\, and wrapped within the \. Remove either\, and it works. If it works sometimes\, that seems to lend more weight to it being a bug.

commit 099ec7dcf9e085a650e6d9010c12ad9649209bf4 Author​: Yves Orton \demerphq@&#8203;gmail\.com Date​: Fri Nov 22 01​:08​:39 2013 +0100

FWIW\, I am not super happy with this implementation. We should have a flag for all forms of recursion\, and use that to decide if we need to allocate the "recursed" bitmap. As is we create it for every branch\, which is bad. I expected RExC_seen_recursed to be useful for this\, but it stubbornly wasnt\, and I didnt have time to dig further.

It fixes the bug however\, and if someone doesnt get to it first I will try to improve it over the weekend.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From @iabyn

On Fri\, Nov 22\, 2013 at 01​:37​:14AM +0100\, demerphq wrote​:

It fixes the bug however\, and if someone doesnt get to it first I will try to improve it over the weekend.

Thanks for this. (And no I'm not volunteering to improve it ;-)

-- The Enterprise successfully ferries an alien VIP from one place to another without serious incident.   -- Things That Never Happen in "Star Trek" #7

p5pRT commented 10 years ago

From kbrannen@pwhome.com

I think this is the same bug\, but I can give you a much simpler example of failure.

$ perl -v

This is perl 5\, version 18\, subversion 2 (v5.18.2) built for x86_64-linux-gnu-thread-multi (with 41 registered patches\, see perl -V for more detail)

$ cat try3.pl #!/usr/bin/perl

my $code = 'SELECT'; $code =~ s/(?\<!CLASSSY0B )\bALTER\b/xyz/igs;

$ perl try3.pl Variable length lookbehind not implemented in regex m/(?\<!CLASSSY0B )\bALTER\b/ at try3.pl line 4.

There is obviously no variable parts anywhere in that regex\, yet we get a failure. Interestingly\, if you remove the "i" qualifier on the end\, then the error goes away. Yes\, I do need that "i" qualifier. :)

p5pRT commented 10 years ago

From ebhanssen@cpan.org

On Tue\, May 13\, 2014 at 12​:10 AM\, Kevin Brannen via RT \< perlbug-followup@​perl.org> wrote​:

I think this is the same bug\, but I can give you a much simpler example of failure.

  I think it's not.

$ cat try3.pl #!/usr/bin/perl

my $code = 'SELECT'; $code =~ s/(?\<!CLASSSY0B )\bALTER\b/xyz/igs;

$ perl try3.pl Variable length lookbehind not implemented in regex m/(?\<!CLASSSY0B )\bALTER\b/ at try3.pl line 4.

There is obviously no variable parts anywhere in that regex\, yet we get a failure. Interestingly\, if you remove the "i" qualifier on the end\, then the error goes away. Yes\, I do need that "i" qualifier. :)

  There are no obviously variable parts\, but there are variable parts​: qr/SS/i is variable length\, since it can match 'ß'.

  The fix would seem to be adding the "/a" modifier\, since you seem to be working on ASCII data\, and explicitly don't want Unicode case insensitivity\, with its variable-length implications.

Eirik

p5pRT commented 10 years ago

From @demerphq

On 13 May 2014 10​:30\, Eirik Berg Hanssen \ebhanssen@&#8203;cpan\.org wrote​:

On Tue\, May 13\, 2014 at 12​:10 AM\, Kevin Brannen via RT \< perlbug-followup@​perl.org> wrote​:

I think this is the same bug\, but I can give you a much simpler example of failure.

I think it's not.

$ cat try3.pl #!/usr/bin/perl

my $code = 'SELECT'; $code =~ s/(?\<!CLASSSY0B )\bALTER\b/xyz/igs;

$ perl try3.pl Variable length lookbehind not implemented in regex m/(?\<!CLASSSY0B )\bALTER\b/ at try3.pl line 4.

There is obviously no variable parts anywhere in that regex\, yet we get a failure. Interestingly\, if you remove the "i" qualifier on the end\, then the error goes away. Yes\, I do need that "i" qualifier. :)

There are no obviously variable parts\, but there are variable parts​: qr/SS/i is variable length\, since it can match 'ß'.

The fix would seem to be adding the "/a" modifier\, since you seem to be working on ASCII data\, and explicitly don't want Unicode case insensitivity\, with its variable-length implications.

Nice. I didnt catch that personally. The error message should explain the problem better I reckon. Not sure how easy that is\, ill take a look one day maybe.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From ebhanssen@cpan.org

On Tue\, May 13\, 2014 at 10​:30 AM\, Eirik Berg Hanssen \ebhanssen@&#8203;cpan\.orgwrote​:

The fix would seem to be adding the "/a" modifier\, since you seem to be working on ASCII data\, and explicitly don't want Unicode case insensitivity\, with its variable-length implications.

  Err\, make that the "/aa" modifier. I'd thought "/a" would suffice\, but no​:

eirik@​greencat[11​:11​:07]~$ perl -e '/(?\<!SS)/i;' Variable length lookbehind not implemented in regex m/(?\<!SS)/ at -e line 1. eirik@​greencat[11​:11​:11]~$ perl -e '/(?\<!SS)/ai;' Variable length lookbehind not implemented in regex m/(?\<!SS)/ at -e line 1. eirik@​greencat[11​:11​:12]~$ perl -e '/(?\<!SS)/aai;' eirik@​greencat[11​:11​:14]~$

  Guess I'll need to reread those docs ...

Eirik

p5pRT commented 10 years ago

From kbrannen@pwhome.com

On Tue May 13 02​:13​:29 2014\, ebhanssen@​cpan.org wrote​:

Err\, make that the "/aa" modifier. I'd thought "/a" would suffice\,

Thanks Eirik! A new way for unicode to bite me that I wasn't aware of\, as if there aren't enough other ways. :) Now that I know what to look for\, I see this in the 5.16.0 change notes and I can go educate myself.

p5pRT commented 10 years ago

From @khwilliamson

On 05/13/2014 03​:13 AM\, Eirik Berg Hanssen wrote​:

On Tue\, May 13\, 2014 at 10​:30 AM\, Eirik Berg Hanssen \<ebhanssen@​cpan.org \mailto&#8203;:ebhanssen@&#8203;cpan\.org> wrote​:

   The fix would seem to be adding the "/a" modifier\, since you seem
to be working on ASCII data\, and explicitly don't want Unicode case
insensitivity\, with its variable\-length implications\.

Err\, make that the "/aa" modifier. I'd thought "/a" would suffice\, but no​:

eirik@​greencat[11​:11​:07]~$ perl -e '/(?\<!SS)/i;' Variable length lookbehind not implemented in regex m/(?\<!SS)/ at -e line 1. eirik@​greencat[11​:11​:11]~$ perl -e '/(?\<!SS)/ai;' Variable length lookbehind not implemented in regex m/(?\<!SS)/ at -e line 1. eirik@​greencat[11​:11​:12]~$ perl -e '/(?\<!SS)/aai;' eirik@​greencat[11​:11​:14]~$

Guess I'll need to reread those docs ...

Eirik

Having /a and /aa was a group decision. I wash my hands of it. /a works on things like \w and [​:punct​:] /aa is /a plus case folding /i

p5pRT commented 10 years ago

From @ap

* Eirik Berg Hanssen \ebhanssen@&#8203;cpan\.org [2014-05-13 10​:35]​:

qr/SS/i is variable length\, since it can match 'ß'.

Cf. also this talk from GPW 2014​: https://youtu.be/8FIGDgNa_CU

p5pRT commented 10 years ago

From @khwilliamson

On 05/14/2014 07​:18 AM\, Aristotle Pagaltzis wrote​:

* Eirik Berg Hanssen \ebhanssen@&#8203;cpan\.org [2014-05-13 10​:35]​:

qr/SS/i is variable length\, since it can match 'ß'.

Cf. also this talk from GPW 2014​: https://youtu.be/8FIGDgNa_CU

Is there some summary or alternative version of this in English?

p5pRT commented 10 years ago

From @khwilliamson

On 05/13/2014 09​:08 AM\, Kevin Brannen via RT wrote​:

On Tue May 13 02​:13​:29 2014\, ebhanssen@​cpan.org wrote​:

Err\, make that the "/aa" modifier. I'd thought "/a" would suffice\,

Thanks Eirik! A new way for unicode to bite me that I wasn't aware of\, as if there aren't enough other ways. :) Now that I know what to look for\, I see this in the 5.16.0 change notes and I can go educate myself.

You can avoid Unicode issues in regexes by doing a

  use re '/aa';

in an outer scope of your code. This causes Perl to behave pretty much like it did before Unicode was introduced.

p5pRT commented 10 years ago

From @ap

* Karl Williamson \public@&#8203;khwilliamson\.com [2014-05-14 18​:05]​:

On 05/14/2014 07​:18 AM\, Aristotle Pagaltzis wrote​:

Cf. also this talk from GPW 2014​: https://youtu.be/8FIGDgNa_CU

Is there some summary or alternative version of this in English?

Oh. Several German speakers gave their talks in English and I seemed to remember this as one of them – I guess the title misled me. There is no transcript or notes that I know of… I’m very sorry for the noise.

Summary​: daxim took the Unicode Consortium’s request for comments as an opportunity to ask for the removal of this case folding rule whereupon they referred him to http​://www.unicode.org/faq/casemap_charprop.html#11 and said the Consortium does not create such rules but follows official orthography. Unfortunately the bodies connected to that in Germany are essentially prescriptivist – the Duden editors\, the Council for German Orthography\, etc. – so it’s not realistic to expect change from there\, meaning that this screwiness will be part of Unicode for the foreseeable future. So he went to implement it himself and that turned out to take just a screenful of thin wrapping around Unicode​::Casing and s///\, which he called Lingua​::DEU​::Casing​::Sharp_s (I don’t see it on CPAN though).

Regards\, -- Aristotle Pagaltzis // \<http​://plasmasturm.org/>

p5pRT commented 10 years ago

From @hvds

"Kevin Brannen via RT" \perlbug\-followup@&#8203;perl\.org wrote​: :I think this is the same bug\, but I can give you a much simpler example of failure. : :$ perl -v : :This is perl 5\, version 18\, subversion 2 (v5.18.2) built for x86_64-linux-gnu-thread-multi :(with 41 registered patches\, see perl -V for more detail) : :$ cat try3.pl :#!/usr/bin/perl : :my $code = 'SELECT'; :$code =~ s/(?\<!CLASSSY0B )\bALTER\b/xyz/igs; : :$ perl try3.pl :Variable length lookbehind not implemented in regex m/(?\<!CLASSSY0B )\bALTER\b/ at try3.pl line 4. : :There is obviously no variable parts anywhere in that regex\, yet we get a failure. Interestingly\, if you remove the "i" qualifier on the end\, then the error goes away. Yes\, I do need that "i" qualifier. :)

I can reduce this to​:   % ./perl -ce '/(?\<!SS)/i'   Variable length lookbehind not implemented in regex m/(?\<!SS)/ at -e line 1.   % ./perl -ce '/(?\<!SS)/iaa'   -e syntax OK   % on blead.

This looks like an intentional change - except in the "superAscii" mode give by /aa\, the /ss/i should also match the case-folded Eszett character (http​://en.wikipedia.org/wiki/Eszett).

Karl\, can you confirm? Are there other useful workarounds?

Hugo

p5pRT commented 10 years ago

From @khwilliamson

On 05/17/2014 03​:59 PM\, hv@​crypt.org wrote​:

"Kevin Brannen via RT" \perlbug\-followup@&#8203;perl\.org wrote​: :I think this is the same bug\, but I can give you a much simpler example of failure. : :$ perl -v : :This is perl 5\, version 18\, subversion 2 (v5.18.2) built for x86_64-linux-gnu-thread-multi :(with 41 registered patches\, see perl -V for more detail) : :$ cat try3.pl :#!/usr/bin/perl : :my $code = 'SELECT'; :$code =~ s/(?\<!CLASSSY0B )\bALTER\b/xyz/igs; : :$ perl try3.pl :Variable length lookbehind not implemented in regex m/(?\<!CLASSSY0B )\bALTER\b/ at try3.pl line 4. : :There is obviously no variable parts anywhere in that regex\, yet we get a failure. Interestingly\, if you remove the "i" qualifier on the end\, then the error goes away. Yes\, I do need that "i" qualifier. :)

I can reduce this to​: % ./perl -ce '/(?\<!SS)/i' Variable length lookbehind not implemented in regex m/(?\<!SS)/ at -e line 1. % ./perl -ce '/(?\<!SS)/iaa' -e syntax OK % on blead.

This looks like an intentional change - except in the "superAscii" mode give by /aa\, the /ss/i should also match the case-folded Eszett character (http​://en.wikipedia.org/wiki/Eszett).

Karl\, can you confirm? Are there other useful workarounds?

Hugo

Yes it follows the unicode standard for better or worse. I don't know of any other workarounds\, other than what I already mentioned on this thread

use re '/aa';

p5pRT commented 10 years ago

From @hvds

Karl Williamson \public@&#8203;khwilliamson\.com wrote​: :Yes it follows the unicode standard for better or worse. I don't know :of any other workarounds\, other than what I already mentioned on this thread : :use re '/aa';

Ah sorry\, I'd missed the earlier followups on this ticket due to a full disk.

Hugo

p5pRT commented 10 years ago

From @abigail

On Sat\, May 17\, 2014 at 10​:59​:19PM +0100\, hv@​crypt.org wrote​:

"Kevin Brannen via RT" \perlbug\-followup@&#8203;perl\.org wrote​: :I think this is the same bug\, but I can give you a much simpler example of failure. : :$ perl -v : :This is perl 5\, version 18\, subversion 2 (v5.18.2) built for x86_64-linux-gnu-thread-multi :(with 41 registered patches\, see perl -V for more detail) : :$ cat try3.pl :#!/usr/bin/perl : :my $code = 'SELECT'; :$code =~ s/(?\<!CLASSSY0B )\bALTER\b/xyz/igs; : :$ perl try3.pl :Variable length lookbehind not implemented in regex m/(?\<!CLASSSY0B )\bALTER\b/ at try3.pl line 4. : :There is obviously no variable parts anywhere in that regex\, yet we get a failure. Interestingly\, if you remove the "i" qualifier on the end\, then the error goes away. Yes\, I do need that "i" qualifier. :)

I can reduce this to​: % ./perl -ce '/(?\<!SS)/i' Variable length lookbehind not implemented in regex m/(?\<!SS)/ at -e line 1. % ./perl -ce '/(?\<!SS)/iaa' -e syntax OK % on blead.

This looks like an intentional change - except in the "superAscii" mode give by /aa\, the /ss/i should also match the case-folded Eszett character (http​://en.wikipedia.org/wiki/Eszett).

Karl\, can you confirm? Are there other useful workarounds?

I guess that /(?\<!(?-i​:SS|ss))/ is a workaround too\, but I'm not going to assess its usefulness.

Abigail

p5pRT commented 10 years ago

From kbrannen@pwhome.com

On Sat May 17 18​:37​:30 2014\, public@​khwilliamson.com wrote​:

Yes it follows the unicode standard for better or worse. I don't know of any other workarounds\, other than what I already mentioned on this thread

use re '/aa';

To help others searching for this later\, the final solution I came up with after lots of reading was​:

use if $] >= 5.016\, re => '/aa';

The program that uses this can be run on a number of machines\, so the versions of perl can not be totally controlled. I'm sure some of the really old ones wouldn't have that "if" pragma\, but it's there even on our old 5.8 servers which should be good enough for most of us and you have to draw the line somewhere.

Kevin

p5pRT commented 10 years ago

From @khwilliamson

On 05/19/2014 04​:17 PM\, Kevin Brannen via RT wrote​:

On Sat May 17 18​:37​:30 2014\, public@​khwilliamson.com wrote​:

Yes it follows the unicode standard for better or worse. I don't know of any other workarounds\, other than what I already mentioned on this thread

use re '/aa';

To help others searching for this later\, the final solution I came up with after lots of reading was​:

use if $] >= 5.016\, re => '/aa';

The program that uses this can be run on a number of machines\, so the versions of perl can not be totally controlled. I'm sure some of the really old ones wouldn't have that "if" pragma\, but it's there even on our old 5.8 servers which should be good enough for most of us and you have to draw the line somewhere.

Kevin

--- via perlbug​: queue​: perl5 status​: open https://rt-archive.perl.org/perl5/Ticket/Display.html?id=120600

This gave me the idea to add text to perldiag to clue people in about this issue. Also \K can be used to work around the problem for positive lookbehind assertions. This has now been pushed to blead as d0a29c363d313dc91fc5bfe71f7a5c525acfed03 If you don't like my wording\, patches welcome

p5pRT commented 10 years ago

From @khwilliamson

0005-perldiag-Add-details-about-variable-length-lookbehin.patch ```diff From 4f0b164c3591304605402252bb9ffe796fe13a8f Mon Sep 17 00:00:00 2001 From: Karl Williamson Date: Mon, 16 Jun 2014 20:00:39 -0600 Subject: [PATCH 5/5] perldiag: Add details about variable length lookbehind See http://nntp.perl.org/group/perl.perl5.porters/215685 --- pod/perldiag.pod | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/pod/perldiag.pod b/pod/perldiag.pod index 7dd0547..19e44e0 100644 --- a/pod/perldiag.pod +++ b/pod/perldiag.pod @@ -6524,7 +6524,20 @@ front of your variable. =item Variable length lookbehind not implemented in regex m/%s/ (F) Lookbehind is allowed only for subexpressions whose length is fixed and -known at compile time. See L. +known at compile time. For positive lookbehind, you can use the C<\K> +regex construct as a way to get the equivalent functionality. See +L. + +There are non-obvious Unicode rules under C that can match variably, +but which you might not think could. For example, the substring C<"ss"> +can match the single character LATIN SMALL LETTER SHARP S. There are +other sequences of ASCII characters that can match single ligature +characters, such as LATIN SMALL LIGATURE FFI matching C. +Starting in Perl v5.16, if you only care about ASCII matches, adding the +C modifier to the regex will exclude all these non-obvious matches, +thus getting rid of this message. You can also say C> +to apply C to all regular expressions compiled within its scope. +See L. =item "%s" variable %s masks earlier declaration in same %s -- 1.9.1 ```
p5pRT commented 8 years ago

From @mauke

On Thu Nov 21 16​:37​:48 2013\, demerphq wrote​:

FWIW\, I am not super happy with this implementation. We should have a flag for all forms of recursion\, and use that to decide if we need to allocate the "recursed" bitmap. As is we create it for every branch\, which is bad. I expected RExC_seen_recursed to be useful for this\, but it stubbornly wasnt\, and I didnt have time to dig further.

It fixes the bug however\, and if someone doesnt get to it first I will try to improve it over the weekend.

This ticket is listed in perl5200delta. Is there still work to be done or can it be closed?

p5pRT commented 5 years ago

From @khwilliamson

Yes\, this ticket can be closed\, and I am so doing.

The second part involved the German U+DF that folds to 'ss'. That is covered by [perl #132367] - Karl Williamson

p5pRT commented 5 years ago

@khwilliamson - Status changed from 'open' to 'resolved'