Perl / perl5

šŸŖ The Perl programming language
https://dev.perl.org/perl5/
Other
1.94k stars 554 forks source link

qr// reuses last successful pattern (like m//) #11549

Closed p5pRT closed 12 years ago

p5pRT commented 13 years ago

Migrated from rt.perl.org#96230 (status was 'resolved')

Searchable as RT96230$

p5pRT commented 13 years ago

From @mauke

Created by @mauke

% perl -wle '"a" =~ /[abc]/; my $re = qr//; $re eq "(?^​:)" or die; $_ = "crab race"; s/$re/./g; print' .r.. r..e

What's going on here is that qr// takes the last successful regex and reuses that pattern (/[abc]/)\, but stringification of the regex object still returns "(?^​:)". I'd say that's pretty bad because now we have qr objects whose behavior is completely disconnected from their stringification. It also works with interpolation - this bug bit me with qr/$foo/ where $foo happened to be empty.

Yeah\, so apart from this being bad I also think it's a bug because it doesn't match the documentation in perlop.

The section

| The empty pattern // | If the PATTERN evaluates to the empty string\, the last successfully | matched regular expression is used instead.

that documents this behavior is part of the documentation for m//\, nothing else.

Under qr// it simply says "This operator quotes (and possibly compiles) its STRING as a regular expression." That's not what happens here.

Perl Info ``` Flags: category=core severity=low This perlbug was built using Perl 5.12.1 - Thu Jun 3 20:09:15 CEST 2010 It is being executed now by Perl 5.14.1 - Wed Jun 22 01:57:37 CEST 2011. Site configuration information for perl 5.14.1: Configured by mauke at Wed Jun 22 01:57:37 CEST 2011. Summary of my perl5 (revision 5 version 14 subversion 1) configuration: Platform: osname=linux, osvers=2.6.37-gentoo-r4, archname=i686-linux uname='linux nora 2.6.37-gentoo-r4 #3 preempt thu may 5 20:36:24 cest 2011 i686 amd athlon(tm) 64 processor 3200+ authenticamd gnulinux ' config_args='' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.6.0', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='gcc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.12.2.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.12.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector' Locally applied patches: SAVEARGV0 - disable magic open in @INC for perl 5.14.1: /home/mauke/usr/local/lib/perl5/site_perl/5.14.1/i686-linux /home/mauke/usr/local/lib/perl5/site_perl/5.14.1 /home/mauke/usr/local/lib/perl5/5.14.1/i686-linux /home/mauke/usr/local/lib/perl5/5.14.1 /home/mauke/usr/local/lib/perl5/site_perl/5.14.0/i686-linux /home/mauke/usr/local/lib/perl5/site_perl/5.14.0 /home/mauke/usr/local/lib/perl5/site_perl . Environment for perl 5.14.1: HOME=/home/mauke LANG=en_US.UTF-8 LANGUAGE (unset) LC_COLLATE=POSIX LD_LIBRARY_PATH=/home/mauke/usr/local/lib LOGDIR (unset) PATH=/home/mauke/usr/local/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/i686-pc-linux-gnu/gcc-bin/4.4.5:/opt/sun-jdk-1.4.2.13/bin:/opt/sun-jdk-1.4.2.13/jre/bin:/opt/sun-jdk-1.4.2.13/jre/javaws:/opt/dmd/bin:/usr/games/bin PERL_BADLANG (unset) PERL_UNICODE=SAL SHELL=/bin/bash ```
p5pRT commented 13 years ago

From @rgarcia

On 3 August 2011 23​:54\, l.mai@​web.de \perlbug\-followup@​perl\.org wrote​:

% perl -wle '"a" =~ /[abc]/; my $re = qr//; $re eq "(?^​:)" or die; $_ = "crab race"; s/$re/./g; print' .r.. r..e

What's going on here is that qr// takes the last successful regex and reuses that pattern (/[abc]/)\, but stringification of the regex object still returns "(?^​:)". I'd say that's pretty bad because now we have qr objects whose behavior is completely disconnected from their stringification. It also works with interpolation - this bug bit me with qr/$foo/ where $foo happened to be empty.

Yeah\, so apart from this being bad I also think it's a bug because it doesn't match the documentation in perlop.

My gut feeling is that this should be deprecated and then removed (that is\, make qr// compile to an empty match.)

p5pRT commented 13 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 13 years ago

From @davidnicol

After deprecation\, qr// should throw an exception\, instead of compiling to an empty match.

Scenarios where there a bunch of regexes and you want to save which one matched are easy to imagine\, though.

if someone somewhere is using it as a feature and the result starts matching everything\, instead of caching the last match\, that could be bad.

It is also easy to imagine rewriting such a scenario using an external index key.

Yeah\, so apart from this being bad I also think it's a bug because it doesn't

match the documentation in perlop.

My gut feeling is that this should be deprecated and then removed (that is\, make qr// compile to an empty match.)

-- "The tools expect that they have full\, unlimited control of the hardware." -- Intel Corporation

p5pRT commented 13 years ago

From @cpansprout

On Thu Aug 04 00​:04​:04 2011\, rgs@​consttype.org wrote​:

On 3 August 2011 23​:54\, l.mai@​web.de \perlbug\-followup@​perl\.org wrote​:

% perl -wle '"a" =~ /[abc]/; my $re = qr//; $re eq "(?^​:)" or die; $_ = "crab race"; s/$re/./g; print' .r.. r..e

What's going on here is that qr// takes the last successful regex and reuses that pattern (/[abc]/)\, but stringification of the regex object still returns "(?^​:)". I'd say that's pretty bad because now we have qr objects whose behavior is completely disconnected from their stringification. It also works with interpolation - this bug bit me with qr/$foo/ where $foo happened to be empty.

That surprised me\, too. One of my modules was buggy as a result\, until I changed qr)) to qr)(?​:\)) :-)

Yeah\, so apart from this being bad I also think it's a bug because it doesn't match the documentation in perlop.

My gut feeling is that this should be deprecated and then removed (that is\, make qr// compile to an empty match.)

Jesseā€™s new 5.16 policy does not specify whether there should be a deprecation warning if we change the behaviour. Jesse\, are you listening?

In a case like this\, which of the following happens? ā€¢ use 5.014 gives no warning; use 5.016 gives new behaviour (like unicode_strings) ā€¢ use 5.014 gives a deprecation warning; use 5.016 gives new behaviour ā€¢ use 5.014 gives no warning; use 5.016 gives deprecation warning and old behaviour; use 5.018/20 gives new behaviour ā€¢ use 5.014/16 gives deprecation warning; use 5.018/20 gives new behaviour

I think Iā€™m in favour of the first item.

p5pRT commented 12 years ago

From @khwilliamson

On Mon Sep 19 23​:55​:26 2011\, sprout wrote​:

On Thu Aug 04 00​:04​:04 2011\, rgs@​consttype.org wrote​:

On 3 August 2011 23​:54\, l.mai@​web.de \perlbug\-followup@​perl\.org wrote​:

% perl -wle '"a" =~ /[abc]/; my $re = qr//; $re eq "(?^​:)" or die; $_ = "crab race"; s/$re/./g; print' .r.. r..e

What's going on here is that qr// takes the last successful regex and reuses that pattern (/[abc]/)\, but stringification of the regex object still returns "(?^​:)". I'd say that's pretty bad because now we have qr objects whose behavior is completely disconnected from their stringification. It also works with interpolation - this bug bit me with qr/$foo/ where $foo happened to be empty.

That surprised me\, too. One of my modules was buggy as a result\, until I changed qr)) to qr)(?​:\)) :-)

Yeah\, so apart from this being bad I also think it's a bug because it doesn't match the documentation in perlop.

My gut feeling is that this should be deprecated and then removed (that is\, make qr// compile to an empty match.)

Jesseā€™s new 5.16 policy does not specify whether there should be a deprecation warning if we change the behaviour. Jesse\, are you listening?

In a case like this\, which of the following happens? ā€¢ use 5.014 gives no warning; use 5.016 gives new behaviour (like unicode_strings) ā€¢ use 5.014 gives a deprecation warning; use 5.016 gives new behaviour ā€¢ use 5.014 gives no warning; use 5.016 gives deprecation warning and old behaviour; use 5.018/20 gives new behaviour ā€¢ use 5.014/16 gives deprecation warning; use 5.018/20 gives new behaviour

I think Iā€™m in favour of the first item.

I lthink this is a candidate for pre-deprecation in 5.16\, deprecation in 5.18\, and removal in 5.18

-- Karl Williamson

p5pRT commented 12 years ago

From @demerphq

On 13 February 2012 19​:18\, Karl Williamson via RT \perlbug\-followup@​perl\.org wrote​:

On Mon Sep 19 23​:55​:26 2011\, sprout wrote​:

On Thu Aug 04 00​:04​:04 2011\, rgs@​consttype.org wrote​:

On 3 August 2011 23​:54\, l.mai@​web.de \perlbug\-followup@​perl\.org wrote​:

% perl -wle '"a" =~ /[abc]/; my $re = qr//; $re eq "(?^​:)" or die; $_ = "crab race"; s/$re/./g; print' .r.. r..e

What's going on here is that qr// takes the last successful regex and reuses that pattern (/[abc]/)\, but stringification of the regex object still returns "(?^​:)". I'd say that's pretty bad because now we have qr objects whose behavior is completely disconnected from their stringification. It also works with interpolation - this bug bit me with qr/$foo/ where $foo happened to be empty.

That surprised me\, too. Ā One of my modules was buggy as a result\, until I changed qr)) to qr)(?​:\)) :-)

Yeah\, so apart from this being bad I also think it's a bug because it doesn't match the documentation in perlop.

My gut feeling is that this should be deprecated and then removed (that is\, make qr// compile to an empty match.)

Jesseā€™s new 5.16 policy does not specify whether there should be a deprecation warning if we change the behaviour. Ā Jesse\, are you listening?

In a case like this\, which of the following happens? ā€¢ use 5.014 gives no warning; use 5.016 gives new behaviour (like unicode_strings) ā€¢ use 5.014 gives a deprecation warning; use 5.016 gives new behaviour ā€¢ use 5.014 gives no warning; use 5.016 gives deprecation warning and old behaviour; use 5.018/20 gives new behaviour ā€¢ use 5.014/16 gives deprecation warning; use 5.018/20 gives new behaviour

I think Iā€™m in favour of the first item.

I lthink this is a candidate for pre-deprecation in 5.16\, deprecation in 5.18\, and removal in 5.18

Do we really have to use a deprecation cycle to fix a bug? If its not documented\, and its not tested for\, and it is obviously a bug (which IMO this is)\, it seems to me like we should just fix it.

cheers\, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From @ikegami

Regarding C\<\< $re=qr//; s/$re// >> triggering the special empty-pattern behaviour of C\<\< s/// >>​:

On Mon\, Feb 13\, 2012 at 3​:19 PM\, demerphq \demerphq@&#8203;gmail\.com wrote​:

Do we really have to use a deprecation cycle to fix a bug? If its not documented\, and its not tested for\, and it is obviously a bug (which IMO this is)\,

and it's hard to imagine intentionally using the behaviour to be fixed

it seems to me like we should just fix it.

cheers\, Yves

p5pRT commented 12 years ago

From @demerphq

On 14 February 2012 18​:47\, Eric Brine \ikegami@&#8203;adaelis\.com wrote​:

Regarding C\<\< $re=qr//; s/$re// >> triggering the special empty-pattern behaviour of C\<\< s/// >>​:

On Mon\, Feb 13\, 2012 at 3​:19 PM\, demerphq \demerphq@&#8203;gmail\.com wrote​:

Do we really have to use a deprecation cycle to fix a bug? If its not documented\, and its not tested for\, and it is obviously a bug (which IMO this is)\,

and it's hard to imagine intentionally using the behaviour to be fixed

Agreed. However\, I had an idea. We can add a new regex magic symbol as something like​: (*LAST_SUCCESSFUL_MATCH)\, which could be given this special behaviour. Then we can "cover" the back compat requirement\, this bug would go away (as it would stringify to (?​:(*LAST_SUCCESSFUL_MATCH))\, and we would have a new tool to play with.

Not saying we SHOULD do this\, but rather that we COULD if we thought it was useful. In fact I'm inclined to say we shouldn't\, but I am honor bound to point out we can.

cheers\, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From @nwc10

On Wed\, Feb 15\, 2012 at 11​:22​:01AM +0100\, demerphq wrote​:

On 14 February 2012 18​:47\, Eric Brine \ikegami@&#8203;adaelis\.com wrote​:

Regarding C\<\< $re=qr//; s/$re// >> triggering the special empty-pattern behaviour of C\<\< s/// >>​:

On Mon\, Feb 13\, 2012 at 3​:19 PM\, demerphq \demerphq@&#8203;gmail\.com wrote​:

Do we really have to use a deprecation cycle to fix a bug? If its not documented\, and its not tested for\, and it is obviously a bug (which IMO this is)\,

and it's hard to imagine intentionally using the behaviour to be fixed

Agreed. However\, I had an idea. We can add a new regex magic symbol as something like​: (*LAST_SUCCESSFUL_MATCH)\, which could be given this special behaviour. Then we can "cover" the back compat requirement\, this bug would go away (as it would stringify to (?​:(*LAST_SUCCESSFUL_MATCH))\, and we would have a new tool to play with.

Not saying we SHOULD do this\, but rather that we COULD if we thought it was useful. In fact I'm inclined to say we shouldn't\, but I am honor bound to point out we can.

To me it feels like a bug\, and hence it should be fixed. Without adding complexity of warnings or a work around variable or other technical debt.

However\, I'd be uncomfortable about doing it this close to 5.16.0 because I'm not *confident* that it's harmless\, and it takes a while for the BBC to sweep CPAN. Possibly longer than we have between now and 5.16.0 code freeze.

I'd be a lot happier with fixing it in 5.17.0 and having six months or more for everything to shake out and be sure.

Nicholas Clark

p5pRT commented 12 years ago

From @demerphq

On 15 February 2012 11​:33\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

On Wed\, Feb 15\, 2012 at 11​:22​:01AM +0100\, demerphq wrote​:

On 14 February 2012 18​:47\, Eric Brine \ikegami@&#8203;adaelis\.com wrote​:

Regarding C\<\< $re=qr//; s/$re// >> triggering the special empty-pattern behaviour of C\<\< s/// >>​:

On Mon\, Feb 13\, 2012 at 3​:19 PM\, demerphq \demerphq@&#8203;gmail\.com wrote​:

Do we really have to use a deprecation cycle to fix a bug? If its not documented\, and its not tested for\, and it is obviously a bug (which IMO this is)\,

and it's hard to imagine intentionally using the behaviour to be fixed

Agreed. However\, I had an idea. We can add a new regex magic symbol as something like​: (*LAST_SUCCESSFUL_MATCH)\, which could be given this special behaviour. Then we can "cover" the back compat requirement\, this bug would go away (as it would stringify to (?​:(*LAST_SUCCESSFUL_MATCH))\, and we would have a new tool to play with.

Not saying we SHOULD do this\, but rather that we COULD if we thought it was useful. In fact I'm inclined to say we shouldn't\, but I am honor bound to point out we can.

To me it feels like a bug\, and hence it should be fixed. Without adding complexity of warnings or a work around variable or other technical debt.

Yay. Hearing you say this makes me happy. :-)

However\, I'd be uncomfortable about doing it this close to 5.16.0 because I'm not *confident* that it's harmless\, and it takes a while for the BBC to sweep CPAN. Possibly longer than we have between now and 5.16.0 code freeze.

I'd be a lot happier with fixing it in 5.17.0 and having six months or more for everything to shake out and be sure.

I thinking there is no rush to fix this wart\, so I think your plan sounds good.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From @tsee

On 02/15/2012 11​:33 AM\, Nicholas Clark wrote​:

However\, I'd be uncomfortable about doing it this close to 5.16.0 because I'm not *confident* that it's harmless\, and it takes a while for the BBC to sweep CPAN. Possibly longer than we have between now and 5.16.0 code freeze.

My CPAN smoke is much faster (full CPAN in a couple of days)\, but also lower-quality in that Andreas does much of the detective work.

In theory\, we could add a second phase to the CPAN smoke to run a bisect for the differing grades\, but that probably warrants a person at least making a boolean call on whether to include a given dist in this.

I think that would be useful\, but I don't have to time to implement it. I'm willing to work with a volunteer[1].

I'd be a lot happier with fixing it in 5.17.0 and having six months or more for everything to shake out and be sure.

*nod*

--Steffen

[1] Not that I've ever found volunteers on this mailing list.

p5pRT commented 12 years ago

From @davidnicol

On Wed\, Feb 15\, 2012 at 4​:39 PM\, Eric Brine \ikegami@&#8203;adaelis\.com wrote​:

On Wed\, Feb 15\, 2012 at 3​:36 PM\, David Nicol \davidnicol@&#8203;gmail\.com wrote​:

There is no memory of what the last regex was at qr time. The empty regex matches the last successful regex at use time\, at least it did at v5.10.1​:

$ perl -wle '"a" =~ /[abc]/; my $re = qr//; warn "$re"; "a" =~ /[car]/; $_ = "crab race"; s/$re/./g; print' (?-xism​:) at -e line 1. ...b ...e

So​: the statement\, "now we have qr objects whose behavior is completely disconnected from their stringification" is false.

It behaves just like an empty regex\,

I did not make that statement\, but it's easy to demonstrate that it's true.

$ perl -E'"b"=~/b/; $re=Ā Ā  qr//; say "a"=~/$re/ ?1​:0' 0

$ perl -E'"b"=~/b/; $re="".qr//; say "a"=~/$re/ ?1​:0' 1

While qr// behaves just like a empty regex\, its stringification does not.

There is definitely a bug somewhere. [the question is\, where.]

Ah! You're right! My armchair proposal is to make qr// remember and return the last regex that matched\, at the time it is encountered\, which I imagine would simplify remembering which of a set of regexes matched.

p5pRT commented 12 years ago

From @rjbs

* demerphq \demerphq@&#8203;gmail\.com [2012-02-13T15​:19​:08]

I lthink this is a candidate for pre-deprecation in 5.16\, deprecation in 5.18\, and removal in 5.18

Do we really have to use a deprecation cycle to fix a bug? If its not documented\, and its not tested for\, and it is obviously a bug (which IMO this is)\, it seems to me like we should just fix it.

I think fixing this directly in 5.17.0 makes sense.

-- rjbs

p5pRT commented 12 years ago

From @cpansprout

Fixed in 7e31363.

p5pRT commented 12 years ago

@cpansprout - Status changed from 'open' to 'resolved'

p5pRT commented 12 years ago

From @cpansprout

On Sat Sep 22 11​:56​:10 2012\, sprout wrote​:

Fixed in 7e31363.

And s/// fixed in 6a97c51d.

--

Father Chrysostomos