Perl / perl5

đŸȘ The Perl programming language
https://dev.perl.org/perl5/
Other
1.99k stars 559 forks source link

Empty regular expression does not match in some cases #13141

Closed p5pRT closed 11 years ago

p5pRT commented 11 years ago

Migrated from rt.perl.org#119095 (status was 'resolved')

Searchable as RT119095$

p5pRT commented 11 years ago

From @ppisar

Hello\,

this code​:

q{"} =~ m/"/;

if (q{a} =~ m//) {   print "TRUE\n"; } else {   print "FALSE\n"; }

should print TRUE\, but it prints FALSE.

In other words\, empty regular expression does not match. There is some side effect because it depends on previous regular match (the first line). If I change the first line anyhow\, like m/"/ changing to m/./\, the code starts working correctly.

I observe this behaviour with somewhat patched 5.16.3\, vanilla 5.18.0 and current blead.

You can use this one-liner instead​:

$ perl -e 'q{"} =~ m/"/; if (q{a} =~ m//) { print qq{TRUE\n} }'

-- Petr

p5pRT commented 11 years ago

From @pjcj

On Wed\, Jul 31\, 2013 at 07​:34​:07AM -0700\, Petr Pisar wrote​:

Hello\,

this code​:

q{"} =~ m/"/;

if (q{a} =~ m//) { print "TRUE\n"; } else { print "FALSE\n"; }

should print TRUE\, but it prints FALSE.

In other words\, empty regular expression does not match. There is some side effect because it depends on previous regular match (the first line). If I change the first line anyhow\, like m/"/ changing to m/./\, the code starts working correctly.

I think this is one of those "it's a feature\, not a bug" moments. Though I'll admit that in over 20 years of using Perl\, it's a feature I've never made use of.

From perlop​:

  The empty pattern //   If the PATTERN evaluates to the empty string\, the last   successfully matched regular expression is used instead. In this   case\, only the "g" and "c" flags on the empty pattern are honored;   the other flags are taken from the original pattern. If no match   has previously succeeded\, this will (silently) act instead as a   genuine empty pattern (which will always match).

-- Paul Johnson - paul@​pjcj.net http​://www.pjcj.net

p5pRT commented 11 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 11 years ago

From @tux

On Wed\, 31 Jul 2013 07​:34​:07 -0700\, Petr Pisar (via RT) \perlbug\-followup@​perl\.org wrote​:

# New Ticket Created by Petr Pisar # Please include the string​: [perl #119095] # in the subject line of all future correspondence about this issue. # \<URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=119095 >

Hello\,

this code​:

q{"} =~ m/"/;

if (q{a} =~ m//) { print "TRUE\n"; } else { print "FALSE\n"; }

should print TRUE\, but it prints FALSE.

Mope.

$ perldoc perlreref

  If 'pattern' is an empty string\, the last successfully matched regex is   used. Delimiters other than '/' may be used for both this operator and the   following ones. The leading "m" can be omitted if the delimiter is '/'.

In other words\, empty regular expression does not match. There is some side effect because it depends on previous regular match (the first line). If I change the first line anyhow\, like m/"/ changing to m/./\, the code starts working correctly.

I observe this behaviour with somewhat patched 5.16.3\, vanilla 5.18.0 and current blead.

You can use this one-liner instead​:

$ perl -e 'q{"} =~ m/"/; if (q{a} =~ m//) { print qq{TRUE\n} }'

-- H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/ using perl5.00307 .. 5.19 porting perl5 on HP-UX\, AIX\, and openSUSE http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/ http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

p5pRT commented 11 years ago

@tux - Status changed from 'open' to 'resolved'

p5pRT commented 11 years ago

From zefram@fysh.org

Paul Johnson wrote​:

       If the PATTERN evaluates to the empty string\, the last
       successfully matched regular expression is used instead\.

Addendum\, which should probably go in the doc​: you can use /(?​:)/ to get an effective empty pattern that will not invoke this magic.

-zefram

p5pRT commented 11 years ago

From @ppisar

On 2013-07-31\, Paul Johnson \paul@&#8203;pjcj\.net wrote​:

I think this is one of those "it's a feature\, not a bug" moments. Though I'll admit that in over 20 years of using Perl\, it's a feature I've never made use of.

From perlop​:

The empty pattern //
        If the PATTERN evaluates to the empty string\, the last
        successfully matched regular expression is used instead\. In this
        case\, only the "g" and "c" flags on the empty pattern are honored;
        the other flags are taken from the original pattern\. If no match
        has previously succeeded\, this will \(silently\) act instead as a
        genuine empty pattern \(which will always match\)\.

I see. Then it's a feature. Never mind.

Just if you want to know my use case\, the second match uses a regular expression specified by an user. And user could assume that an empty expression matches any string.

-- Petr

p5pRT commented 11 years ago

From zefram@fysh.org

Petr Pisar wrote​:

Just if you want to know my use case\, the second match uses a regular expression specified by an user. And user could assume that an empty expression matches any string.

To provide consistent semantics to the user\, you need to process the user-supplied regexp\, by something like

  $perlre = $userre eq "" ? qr/(?​:)/ : qr/$userre/;

or

  $perlre = qr/(?​:$userre)/;

(Compiling the regexp early with qr// is often a good idea.)

-zefram

p5pRT commented 11 years ago

From @cpansprout

On Wed Jul 31 08​:34​:08 2013\, zefram@​fysh.org wrote​:

Petr Pisar wrote​:

Just if you want to know my use case\, the second match uses a regular expression specified by an user. And user could assume that an empty expression matches any string.

To provide consistent semantics to the user\, you need to process the user-supplied regexp\, by something like

$perlre = $userre eq "" ? qr/\(?&#8203;:\)/ : qr/$userre/;

or

$perlre = qr/\(?&#8203;:$userre\)/;

(Compiling the regexp early with qr// is often a good idea.)

Watch out for qr/$userre/. I fixed that in perl 5.18 (commit 6a97c51d3ccb)\, but in earlier perls qr// would trigger the same behaviour. In 5.18+ qr/$userre/ will work as expected (like /(?​:)/) with empty patterns.

--

Father Chrysostomos

p5pRT commented 11 years ago

From zefram@fysh.org

Father Chrysostomos via RT wrote​:

Watch out for qr/$userre/. I fixed that in perl 5.18 (commit 6a97c51d3ccb)\, but in earlier perls qr// would trigger the same behaviour.

That's what the conditional in my example is avoiding.

-zefram

p5pRT commented 11 years ago

From @epa

Out of interest is there a performance boost from reapplying the last successfully matched regexp using // or is it just a golfing shortcut?

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 11 years ago

From @demerphq

On 1 August 2013 12​:38\, Ed Avis \eda@&#8203;waniasset\.com wrote​:

Out of interest is there a performance boost from reapplying the last successfully matched regexp using // or is it just a golfing shortcut?

Hypothetically a tiny boost.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @demerphq

On 1 August 2013 13​:03\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 1 August 2013 12​:38\, Ed Avis \eda@&#8203;waniasset\.com wrote​:

Out of interest is there a performance boost from reapplying the last successfully matched regexp using // or is it just a golfing shortcut?

Hypothetically a tiny boost.

I should add that its a feature that has extremely limit utility.

As far as I can tell it is only useful in a case like this​:

if (/pat1/ || /pat2/ || /pat3/) {   s//$something/; # change whatever we matched }

Or similar constructs. It actually makes no sense that it applies to m//\, to the extent it should exist at all it should apply only to s///.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @demerphq

On 1 August 2013 13​:08\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 1 August 2013 13​:03\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 1 August 2013 12​:38\, Ed Avis \eda@&#8203;waniasset\.com wrote​:

Out of interest is there a performance boost from reapplying the last successfully matched regexp using // or is it just a golfing shortcut?

Hypothetically a tiny boost.

I should add that its a feature that has extremely limit utility.

As far as I can tell it is only useful in a case like this​:

if (/pat1/ || /pat2/ || /pat3/) { s//$something/; # change whatever we matched }

Or similar constructs. It actually makes no sense that it applies to m//\, to the extent it should exist at all it should apply only to s///.

IMO we should nuke it and replace it with a (*LASTMATCH) metapattern.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @ap

* demerphq \demerphq@&#8203;gmail\.com [2013-08-01 13​:10]​:

I should add that its a feature that has extremely limit utility.

As far as I can tell it is only useful in a case like this​:

if (/pat1/ || /pat2/ || /pat3/) { s//$something/; # change whatever we matched }

Or similar constructs. It actually makes no sense that it applies to m//\, to the extent it should exist at all it should apply only to s///.

It makes perfect where this shortcut came from – namely ed\, the ancient Unix text editor. (Think about it​: the only way to interact with the editor is a command line. You have just performed a search. In the next command you want to specify the same search again. What is the natural syntax to say that? Also​: the user will want to search for nothing
 how often?)

From there it was inherited by sed\, and that is how it ended up in Perl.

A lot of the syntax and idioms lore that we think of as “regexps”\, at least in a Unix-y tradition\, is really the regexp vernacular of ed. The entire grep utility is an extraction of an ed idiom as a stand-alone program.

And even when I say all this\, I am almost certainly being ahistorical – I do not know in detail the lineage and history of ed and all its next of kin (ex/vi\, grep\, sed\, patch etc) and would actually be surprised if the story weren’t more intertwined and complex than my portrayal\, even WRT just this one aspect.

(I expect Aaron to come up behind me and embarrass me now. :-) )

* demerphq \demerphq@&#8203;gmail\.com [2013-08-01 13​:10]​:

IMO we should nuke it and replace it with a (*LASTMATCH) metapattern.

Yes\, probably. It made great sense in a text editor and may still make sense in high-whipuptitude\, low-manipulexity code (Perl as a glorified sed\, basically)\, but that is very little of the Perl that gets written nowadays. It is effectively a pure liability in high-manipulexity code (any code that has the CPAN nature\, essentially).

But boy would we need a long deprecation cycle for this one. (It pre- dates Perl itself!)

Regards\, -- Aristotle Pagaltzis // \<http​://plasmasturm.org/>

p5pRT commented 11 years ago

From @arc

Aristotle Pagaltzis \pagaltzis@&#8203;gmx\.de wrote​:

A lot of the syntax and idioms lore that we think of as “regexps”\, at least in a Unix-y tradition\, is really the regexp vernacular of ed. The entire grep utility is an extraction of an ed idiom as a stand-alone program.

And even when I say all this\, I am almost certainly being ahistorical – I do not know in detail the lineage and history of ed and all its next of kin (ex/vi\, grep\, sed\, patch etc) and would actually be surprised if the story weren’t more intertwined and complex than my portrayal\, even WRT just this one aspect.

(I expect Aaron to come up behind me and embarrass me now. :-) )

Nope\, your summary pretty much covers it. :-)

ed(1) already exists in the First Edition manual (so before November 1971)\, but neither sed(1) nor grep(1) do​: http​://cm.bell-labs.com/cm/cs/who/dmr/man12.pdf http​://cm.bell-labs.com/cm/cs/who/dmr/man13.pdf

grep(1) came next\, in Fourth Edition (so between February and November 1973)​:

http​://www.tuhs.org/Archive/PDP-11/Distributions/research/Dennis_v4/v4man.tar.gz

In 1975\, George Coulouris at Queen Mary College (in London; subsequently renamed Queen Mary and Westfield\, and then Queen Mary\, University of London) wrote em ("editor for mortals")\, an interactive ed(1)-like editor for cursor-addressed displays. When he visited Berkeley in 1976\, he took it with him\, and a certain Bill Joy took it and morphed it into ex(1)\, which shipped in 1BSD (March 1978)​:

http​://www.eecs.qmul.ac.uk/~gc/history/

vi(1) was originally (in 2BSD\, May 1979) a hard link to ex(1); when it was launched under that name\, it would start in visual mode rather than normal mode\, but ex(1) had all the same abilities.

sed(1) didn't appear till Seventh Edition\, in January 1979​:

http​://plan9.bell-labs.com/7thEdMan/v7vol1.pdf

The original diff(1) appeared in Fifth Edition (June 1974)\, and originally generated only "edit scripts" (Ă  la modern `diff -e`) that could be passed to ed(1)​:

http​://www.tuhs.org/Archive/PDP-11/Distributions/research/Dennis_v5/v5man.pdf

As for patch(1)\, Larry first wrote it in 1984\, and published it in 1985; it already handled context and unified diffs at that point\, as well as the traditional edit scripts​:

https://groups.google.com/forum/#!topic/mod.sources/xSQM63e39YY

Now\, Ken Thompson wrote the Unix ed(1) in PDP-11 assembler​:

https://code.google.com/p/unix-jun72/source/browse/trunk/src/cmd/ed2.s https://code.google.com/p/unix-jun72/source/browse/trunk/src/cmd/ed3.s

This means it can be dated to some time in 1971\, according to Dennis Ritchie​:

http​://cm.bell-labs.com/who/dmr/hist.html

But it turns out we can rewind a little further. A team at UCB (including L. Peter Deutsch) wrote an editor called qed in 1968​:

http​://web.archive.org/web/20120219114658/http​://www.computer-refuge.org/bitsavers/pdf/sds/ucbProjectGenie/mcjones/R-15_QED.pdf

It's still possible to see the core of the ed(1) design in that\, even though the details differ quite a lot; for example\, the 1968 qed doesn't have regexes at all.

Ken Thompson ported qed to CTSS circa 1970\, and therefore shortly *before* he wrote ed(1); the manual for his port can be found here​:

http​://cm.bell-labs.com/cm/cs/who/dmr/qedman.pdf

This is much more similar to the ed(1) we know and (presumably) love\, including regexes strictly more powerful than those in traditional ed(1)\, and slashes to delimit them (where the 1968 qed used square brackets for its search strings). And we find that the manual says "The null regular expression standing alone is equivalent to the last regular expression encountered."

So this aspect of Perl can be dated back to code written no later than 1970\, for a text editor running on an operating system that I suspect noone subscribed to this list has ever used.

Enjoy!

-- Aaron Crane ** http​://aaroncrane.co.uk/

p5pRT commented 11 years ago

From @khwilliamson

On 08/05/2013 12​:09 PM\, Aaron Crane wrote​:

So this aspect of Perl can be dated back to code written no later than 1970\, for a text editor running on an operating system that I suspect noone subscribed to this list has ever used.

For the record\, I come close. I used to use the qed text editor on a Bell Labs operating system called TSS. I presume this is related to the CTSS menioned.

p5pRT commented 11 years ago

From @arc

Karl Williamson \public@&#8203;khwilliamson\.com wrote​:

On 08/05/2013 12​:09 PM\, Aaron Crane wrote​:

So this aspect of Perl can be dated back to code written no later than 1970\, for a text editor running on an operating system that I suspect noone subscribed to this list has ever used.

For the record\, I come close. I used to use the qed text editor on a Bell Labs operating system called TSS. I presume this is related to the CTSS menioned.

Thank you!

AFAICT from Googling\, CTSS came first\, and Bell Labs had both "Nike TSS" (a copy of CTSS) and the IBM TSS/360 (which apparently isn't closely related to it). This piece says that the Nike TSS ran at the Bell Labs Whippany facility\, and TSS/360 at Indian Hill​:

http​://manpages.bsd.lv/history/canaday_24_10_2011.txt

It does seem to rely on the participants' memory\, though\, so perhaps it isn't entirely accurate.

-- Aaron Crane ** http​://aaroncrane.co.uk/

p5pRT commented 11 years ago

From @khwilliamson

On 08/05/2013 01​:10 PM\, Aaron Crane wrote​:

Karl Williamson \public@&#8203;khwilliamson\.com wrote​:

On 08/05/2013 12​:09 PM\, Aaron Crane wrote​:

So this aspect of Perl can be dated back to code written no later than 1970\, for a text editor running on an operating system that I suspect noone subscribed to this list has ever used.

For the record\, I come close. I used to use the qed text editor on a Bell Labs operating system called TSS. I presume this is related to the CTSS menioned.

Thank you!

AFAICT from Googling\, CTSS came first\, and Bell Labs had both "Nike TSS" (a copy of CTSS) and the IBM TSS/360 (which apparently isn't closely related to it). This piece says that the Nike TSS ran at the Bell Labs Whippany facility\, and TSS/360 at Indian Hill​:

http​://manpages.bsd.lv/history/canaday_24_10_2011.txt

It does seem to rely on the participants' memory\, though\, so perhaps it isn't entirely accurate.

I remotely used the one from Indian Hill (IH for short; located in a Chicago suburb)\, so a different OS\, but I suspect that it's the same QED that had been ported to it.

p5pRT commented 11 years ago

From @ap

* Aaron Crane \arc@&#8203;cpan\.org [2013-08-05 20​:10]​:

Aristotle Pagaltzis \pagaltzis@&#8203;gmx\.de wrote​:

And even when I say all this\, I am almost certainly being ahistorical – I do not know in detail the lineage and history of ed and all its next of kin (ex/vi\, grep\, sed\, patch etc) and would actually be surprised if the story weren’t more intertwined and complex than my portrayal\, even WRT just this one aspect.

(I expect Aaron to come up behind me and embarrass me now. :-) )

Nope\, your summary pretty much covers it. :-)

Wow\, there’s an actual straightforward corner within Unix history. :-)

So this aspect of Perl can be dated back to code written no later than 1970\, for a text editor running on an operating system that I suspect noone subscribed to this list has ever used.

Which\, to be explicit\, means it predates Perl (1.0 in 1987) by nearly two decades. Not bad


Enjoy!

I did\, thank you. :-) I hadn’t heard of em! Nor qed\, of course\, but I wouldn’t have expected to anyway. And d’oh\, it was diff that I meant to mention\, not patch (though patch too).

Regards\, -- Aristotle Pagaltzis // \<http​://plasmasturm.org/>

p5pRT commented 11 years ago

From @epa

Aaron Crane \<arc \ cpan.org> writes​:

A team at UCB (including L. Peter Deutsch) wrote an editor called qed in 1968

I see that Perl's $. variable for the current line number can also be traced back to qed\, if not earlier.

-- Ed Avis \eda@&#8203;waniasset\.com