Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.93k stars 552 forks source link

re modifier "h" - return named captures as hash expression #12893

Open p5pRT opened 11 years ago

p5pRT commented 11 years ago

Migrated from rt.perl.org#117447 (status was 'open')

Searchable as RT117447$

p5pRT commented 11 years ago

From @daxim

Created by @daxim

Example code adapted from perlretut.

Make the C\ (mnemonic I\) flag work (that line with the C\ operator)​:

  my $fmt1 = '(?\\d\d\d\d)-(?\\d\d)-(?\\d\d)';   my $fmt2 = '(?\\d\d)/(?\\d\d)/(?\\d\d\d\d)';   my $fmt3 = '(?\\d\d)\.(?\\d\d)\.(?\\d\d\d\d)';

  for my $d (qw(2006-10-21 15.01.2007 10/31/2005)) {   if (my (%date) = $d =~ m{$fmt1|$fmt2|$fmt3}h) {   while (my ($k\,$v) = each %date) {   print "$k = $v\n";   }   }   }

Works the same as​:

  if ($d =~ m{$fmt1|$fmt2|$fmt3}) {   my %date = %+;

Rationale​: side effects are a weird-ass way to program in a language that actually has operators/expressions/functions which are able to return values. I'd like eventually to get rid of side effects\, but first there actually must be a way to do something without involving action at a distance. If you can't see what's wrong with the code just above\, imagine you had do this to get the length of something​:

  length($something);   # according to perlvar\, $ˑ is set to the last successful length   # measuring   print "something is $ˑ long";   # take care not to use an outdated $ˑ or accidently overwrite   # it! :-o

Perl Info ``` Flags: category=core severity=low Site configuration information for perl 5.16.3: Configured by daxim at Fri Mar 29 15:53:58 CET 2013. Summary of my perl5 (revision 5 version 16 subversion 3) configuration: Platform: osname=linux, osvers=3.4.28-2.20-desktop, archname=x86_64-linux-thread-multi-ld uname='linux champion 3.4.28-2.20-desktop #1 smp preempt tue jan 29 16:51:37 utc 2013 (143156b) x86_64 x86_64 x86_64 gnulinux ' config_args='-de -Dprefix=/home/daxim/local/share/perlbrew/perls/perl-5.16.3 -DDEBUGGING -Dusemorebits -Dusethreads -Dcf_email=daxim@cpan.org -Dperladmin=daxim@cpan.org -Accflags=-fPIC -Aeval:scriptdir=/home/daxim/local/share/perlbrew/perls/perl-5.16.3/bin' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=define usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fPIC -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -g', cppflags='-D_REENTRANT -D_GNU_SOURCE -fPIC -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.7.2 20130108 [gcc-4_7-branch revision 195012]', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='long double', nvsize=16, Off_t='off_t', lseeksize=8 alignbytes=16, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /lib/../lib64 /usr/lib/../lib64 /lib /usr/lib /lib64 /usr/lib64 /usr/local/lib64 libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc libc=/lib/libc-2.17.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.17' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector' Locally applied patches: @INC for perl 5.16.3: /home/daxim/local/share/perlbrew/perls/perl-5.16.3/lib/site_perl/5.16.3/x86_64-linux-thread-multi-ld /home/daxim/local/share/perlbrew/perls/perl-5.16.3/lib/site_perl/5.16.3 /home/daxim/local/share/perlbrew/perls/perl-5.16.3/lib/5.16.3/x86_64-linux-thread-multi-ld /home/daxim/local/share/perlbrew/perls/perl-5.16.3/lib/5.16.3 . Environment for perl 5.16.3: HOME=/home/daxim LANG=de_DE.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/lib64/mpi/gcc/openmpi/lib64 LOGDIR (unset) PATH=/home/daxim/local/share/perlbrew/bin:/home/daxim/local/share/perlbrew/perls/perl-5.16.3/bin:/home/daxim/local/bin:/usr/local/cuda/bin:/opt/kde3/sbin:/sbin:/usr/sbin:/usr/lib64/mpi/gcc/openmpi/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin PERLBREW_BASHRC_VERSION=0.61 PERLBREW_HOME=/home/daxim/.perlbrew PERLBREW_MANPATH=/home/daxim/local/share/perlbrew/perls/perl-5.16.3/man PERLBREW_PATH=/home/daxim/local/share/perlbrew/bin:/home/daxim/local/share/perlbrew/perls/perl-5.16.3/bin PERLBREW_PERL=perl-5.16.3 PERLBREW_ROOT=/home/daxim/local/share/perlbrew PERLBREW_VERSION=0.61 PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 11 years ago

From @jkeenan

On Tue Apr 02 07​:36​:23 2013\, daxim wrote​:

This is a bug report for perl from daxim@​cpan.org\, generated with the help of perlbug 1.39 running under perl 5.16.3.

----------------------------------------------------------------- [Please describe your issue here]

Example code adapted from perlretut.

Make the C\ (mnemonic I\) flag work (that line with the C\ operator)​:

my $fmt1 = '\(?\<y>\\d\\d\\d\\d\)\-\(?\<m>\\d\\d\)\-\(?\<d>\\d\\d\)';
my $fmt2 = '\(?\<m>\\d\\d\)/\(?\<d>\\d\\d\)/\(?\<y>\\d\\d\\d\\d\)';
my $fmt3 = '\(?\<d>\\d\\d\)\\\.\(?\<m>\\d\\d\)\\\.\(?\<y>\\d\\d\\d\\d\)';

for my $d \(qw\(2006\-10\-21 15\.01\.2007 10/31/2005\)\) \{
    if \(my \(%date\) = $d =~ m\{$fmt1|$fmt2|$fmt3\}h\) \{
        while \(my \($k\,$v\) = each %date\) \{
            print "$k = $v\\n";
        \}
    \}
\}

Works the same as​:

    if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
        my %date = %\+;

Rationale​: side effects are a weird-ass way to program in a language that actually has operators/expressions/functions which are able to return values. I'd like eventually to get rid of side effects\, but first there actually must be a way to do something without involving action at a distance. If you can't see what's wrong with the code just above\, imagine you had do this to get the length of something​:

length\($something\);
\# according to perlvar\, $ˑ is set to the last successful length
\# measuring
print "something is $ˑ long";
\# take care not to use an outdated $ˑ or accidently overwrite
\# it\!   :\-o

This RT is a request for a new feature​: a new regex modifier '/h'.

Is there any support for development of this new feature? (I ask\, in part\, because it hasn't received a "second the motion" in the three months since the request was originally filed.)

Is there anyone who wants to try to write an implementation for this new feature?

Thank you very much. Jim Keenan

p5pRT commented 11 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 11 years ago

From tchrist@perl.com

"James E Keenan via RT" \perlbug\-followup@&#8203;perl\.org wrote   on Sat\, 29 Jun 2013 07​:12​:06 PDT​:

Make the C\ (mnemonic I\) flag work (that line with the C\ operator)​:

my $fmt1 = '\(?\<y>\\d\\d\\d\\d\)\-\(?\<m>\\d\\d\)\-\(?\<d>\\d\\d\)';
my $fmt2 = '\(?\<m>\\d\\d\)/\(?\<d>\\d\\d\)/\(?\<y>\\d\\d\\d\\d\)';
my $fmt3 = '\(?\<d>\\d\\d\)\\\.\(?\<m>\\d\\d\)\\\.\(?\<y>\\d\\d\\d\\d\)';

for my $d \(qw\(2006\-10\-21 15\.01\.2007 10/31/2005\)\) \{
    if \(my \(%date\) = $d =~ m\{$fmt1|$fmt2|$fmt3\}h\) \{
        while \(my \($k\,$v\) = each %date\) \{
            print "$k = $v\\n";
        \}
    \}
\}

Works the same as​:

    if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
        my %date = %\+;

I am opposed. If it "works the same as"\, we don't need another way.

It increases the cognitive load unnecessarily for no real gain.

And I don’t want us to keep adding /mods. We have to think of another way\, something that embeds them and isn't stuck at mysterioius one-letter identifiers.

--tom

p5pRT commented 11 years ago

From gottreu@gmail.com

 my $fmt1 = '\(?\<y>\\d\\d\\d\\d\)\-\(?\<m>\\d\\d\)\-\(?\<d>\\d\\d\)';
 my $fmt2 = '\(?\<m>\\d\\d\)/\(?\<d>\\d\\d\)/\(?\<y>\\d\\d\\d\\d\)';
 my $fmt3 = '\(?\<d>\\d\\d\)\\\.\(?\<m>\\d\\d\)\\\.\(?\<y>\\d\\d\\d\\d\)';

 for my $d \(qw\(2006\-10\-21 15\.01\.2007 10/31/2005\)\) \{
     if \(my \(%date\) = $d =~ m\{$fmt1|$fmt2|$fmt3\}h\) \{
         while \(my \($k\,$v\) = each %date\) \{
             print "$k = $v\\n";
         \}
     \}
 \}

I kinda like the idea.

On 06/29/2013 02​:36 PM\, Tom Christiansen wrote​:

And I don’t want us to keep adding /mods. We have to think of another way\, something that embeds them and isn't stuck at mysterioius one-letter identifiers.

Could you clarify what you mean by embed and what the antecedent of 'them' is?

Since =~ binds a scalar expression to a pattern match\, essentially replacing $_ with the expression\, we could extend what =~ accepts on the left side.

($d\, %date) =~ m{...} # %date = %+

and it would still return the same value (depending on context of course).

Following down that path\, one could imagine

($d\, $lvalue) =~ s{...}{...} # set /r implicitly

($d\, @​matches) =~ m{...} # @​matches = ($1\,$2\,$3\,...)

Or a functional interface could be used​:

match(qr/.../\, $d\, named_captures => \%date)

I'm not necessarily advocating for any of these\, it's just what I thought of.

Brian Gottreu

p5pRT commented 11 years ago

From tchrist@perl.com

Brian Gottreu \gottreu@&#8203;gmail\.com wrote on Sat\, 29 Jun 2013 18​:03​:43 CDT​:

On 06/29/2013 02​:36 PM\, Tom Christiansen wrote​:

And I don’t want us to keep adding /mods. We have to think of another way\, something that embeds them and isn't stuck at mysterioius one-letter identifiers.

Could you clarify what you mean by embed and what the antecedent of 'them' is?

The antecedent of “them” is /mods\, like /acdgilmopsux\, stressed on the last syllable. Embedding them is necessary for pattern flags albeit not for match flags. You know\, the (?six-m​:...) thing.

I don’t like the idea of single-character signifiers carrying so much meaning with no more readable way of expressing them. And I certainly don’t think we should go adding more of those without coming up with a way to somehow write something more meaningful\, like a real word for each of them.

--tom

p5pRT commented 11 years ago

From @iabyn

On Sat\, Jun 29\, 2013 at 01​:36​:52PM -0600\, Tom Christiansen wrote​:

Works the same as​:

    if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
        my %date = %\+;

I am opposed. If it "works the same as"\, we don't need another way.

If there were such a modifier (and I agree that it should be "better" than another 1-letter flag)\, I would prefer to see some new semantics\, such as returning a structured (i.e. nested) match object from a nested search pattern (although I have no idea how the details would pan out).

-- Spock (or Data) is fired from his high-ranking position for not being able to understand the most basic nuances of about one in three sentences that anyone says to him.   -- Things That Never Happen in "Star Trek" #19

p5pRT commented 11 years ago

From @daxim

tchrist​:

If it "works the same as"\, we don't need another way.

Then please go ahead and remove the code responsible for return value of the match operator\, as in C\<@​captures = $val =~ /…(…)…(…)…(…)…/>.

L\<Perl 1 capture variables|http​://perldoc.perl.org/ perlvar.html#Variables-related-to-regular-expressions> are good enough for everyone! After all\, Perl's motto is "there must be only one way to do it".

It increases the cognitive load unnecessarily for no real gain.

The real gains are​:

* The match operator restores feature parity. When named captures were added\, proper return values for them were left out inadvertently\, I believe.

  match op/capturing|unnamed|named   ----------------------------------------   with-side effects |$1 etc.|%+   return values |yes |unnamed only!

* In the future\, C\<no re 'side-effects'> becomes possible which eliminates a source of bugs. (It used to be mentioned in perltrap that $1 etc. are not reset after a match fails\, but in 5.18 it's gone for some weird reason.) Implementing that pragma unimport blocks on having a side-effect free way to return named captured values in the first place.

On the other side of the scale is​:

* One more flag where there are already twelve.

But this is hardly a straw that will break the camel's back. It's up to you to back up your claim that there is a downside with concrete evidence/examples\, not just vague allusions.

Brian​:

($d\, %date) =~ m{...} # %date = %+

I don't like that because it requires a variable. This is hardly an advantage over %+.

A simple list of return values can flow freely between chained/nested functions\, which is very perlish.

davem​:

I would prefer to see some new semantics … no idea how the details would pan out

Don't let that get in the way of the feature request under discussion. Worse is better\, and the like. I imagine this topic's feature request is very easy to implement because the pieces are already there and needs no further specification\, whereas a different return value\, with nested structures as you said\, or perhaps an L\<object a la Perl6|http​:// doc.perl6.org/type/Match>\, would be the topic of another bug.

PS​: I'm answering via RT web interface\, no idea where in the p5p thread this message ends up.

p5pRT commented 11 years ago

From @Hugmeir

On Tue\, Apr 2\, 2013 at 11​:36 AM\, Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯 \<perlbug-followup@​perl.org

wrote​:

# New Ticket Created by Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯 # Please include the string​: [perl #117447] # in the subject line of all future correspondence about this issue. # \<URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=117447 >

This is a bug report for perl from daxim@​cpan.org\, generated with the help of perlbug 1.39 running under perl 5.16.3.

----------------------------------------------------------------- [Please describe your issue here]

Example code adapted from perlretut.

Make the C\ (mnemonic I\) flag work (that line with the C\ operator)​:

my $fmt1 = '\(?\<y>\\d\\d\\d\\d\)\-\(?\<m>\\d\\d\)\-\(?\<d>\\d\\d\)';
my $fmt2 = '\(?\<m>\\d\\d\)/\(?\<d>\\d\\d\)/\(?\<y>\\d\\d\\d\\d\)';
my $fmt3 = '\(?\<d>\\d\\d\)\\\.\(?\<m>\\d\\d\)\\\.\(?\<y>\\d\\d\\d\\d\)';

for my $d \(qw\(2006\-10\-21 15\.01\.2007 10/31/2005\)\) \{
    if \(my \(%date\) = $d =~ m\{$fmt1|$fmt2|$fmt3\}h\) \{
        while \(my \($k\,$v\) = each %date\) \{
            print "$k = $v\\n";
        \}
    \}
\}

Works the same as​:

    if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
        my %date = %\+;

How would it work with code that mixes traditional & named captures? For example\, what would this do?

my %matches = "ab" =~ /(.)(?\.)/h;

p5pRT commented 11 years ago

From @daxim

How would it work with code that mixes traditional & named captures?

Same as %+.

For example\, what would this do?

my %matches = "ab" =~ /(.)(?\.)/h;

Expression returns (foo => 'b')

p5pRT commented 11 years ago

From @cpansprout

WARNING​: Bikes shedding their paint ahead.

On Mon Jul 01 03​:38​:27 2013\, davem wrote​:

On Sat\, Jun 29\, 2013 at 01​:36​:52PM -0600\, Tom Christiansen wrote​:

Works the same as​:

    if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
        my %date = %\+;

I am opposed. If it "works the same as"\, we don't need another way.

If there were such a modifier (and I agree that it should be "better" than another 1-letter flag)

use v5.20; qr :ignorecase :multiline :hash /...../; # same as /imh

Or​:

/(?+ignorecase multiline hash)...../

with these variations as well​:

(?+named flags​:pattern) (?+turn these on - turn these off) (?+turn on-turn off​:pattern) (?+-turn off) (?+-turn off​:pat)

--

Father Chrysostomos

p5pRT commented 11 years ago

From @khwilliamson

On 07/01/2013 06​:51 PM\, Father Chrysostomos via RT wrote​:

WARNING​: Bikes shedding their paint ahead.

On Mon Jul 01 03​:38​:27 2013\, davem wrote​:

On Sat\, Jun 29\, 2013 at 01​:36​:52PM -0600\, Tom Christiansen wrote​:

Works the same as​:

     if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
         my %date = %\+;

I am opposed. If it "works the same as"\, we don't need another way.

If there were such a modifier (and I agree that it should be "better" than another 1-letter flag)

use v5.20; qr :ignorecase :multiline :hash /...../; # same as /imh

Or​:

/(?+ignorecase multiline hash)...../

with these variations as well​:

(?+named flags​:pattern) (?+turn these on - turn these off) (?+turn on-turn off​:pattern) (?+-turn off) (?+-turn off​:pat)

An idea I had quite some time ago was like this​:

m/(?^u{multiline\, -ignorecase\, ...}​:foo)/

in which long modifier names would come enclosed in braces anywhere between the (? and the colon. Modifiers outside the braces would be single character ones. pluses and minuses could be used. Any number of braced sets would be acceptable.

p5pRT commented 11 years ago

From @demerphq

On 2 July 2013 02​:51\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

WARNING​: Bikes shedding their paint ahead.

On Mon Jul 01 03​:38​:27 2013\, davem wrote​:

On Sat\, Jun 29\, 2013 at 01​:36​:52PM -0600\, Tom Christiansen wrote​:

Works the same as​:

    if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
        my %date = %\+;

I am opposed. If it "works the same as"\, we don't need another way.

If there were such a modifier (and I agree that it should be "better" than another 1-letter flag)

use v5.20; qr :ignorecase :multiline :hash /...../; # same as /imh

Or​:

/(?+ignorecase multiline hash)...../

with these variations as well​:

(?+named flags​:pattern) (?+turn these on - turn these off) (?+turn on-turn off​:pattern) (?+-turn off) (?+-turn off​:pat)

FWIW\, I hate it\, and I would be against new modifiers you can only put inside of a (?+ ... ).

My view is Perl has regex modifiers and its too late to argue about it anymore.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @davidnicol

    if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
        my %date = %\+;

the AAAD can be minimized with

  my %date = do { $d =~ m{$fmt1|$fmt2|$fmt3} ? %+ : () };

can it not?

p5pRT commented 11 years ago

From gottreu@gmail.com

On 07/02/2013 01​:00 AM\, David Nicol wrote​:

     if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
         my %date = %\+;

the AAAD can be minimized with

 my %date = do \{ $d =~ m\{$fmt1|$fmt2|$fmt3\} ? %\+ : \(\) \};

can it not?

Actually no it seems.

$ perl -MData​::Dumper -we \ '$_="a";print Dumper do{/(?\.)/;%+}\,do{/(?\.)/;my%h=%+};' $VAR1 = 'x'; $VAR2 = undef; $VAR3 = 'x'; $VAR4 = 'a';

It looks like it's been that way since at least 5.10.1.

Is this a bug?

Brian Gottreu

p5pRT commented 11 years ago

From @ikegami

On Tue\, Jul 2\, 2013 at 2​:00 AM\, David Nicol \davidnicol@&#8203;gmail\.com wrote​:

    if \($d =~ m\{$fmt1|$fmt2|$fmt3\}\) \{
        my %date = %\+;

the AAAD can be minimized with

my %date = do \{ $d =~ m\{$fmt1|$fmt2|$fmt3\} ? %\+ : \(\) \};

can it not?

or

my %date = ( $d =~ m{$fmt1|$fmt2|$fmt3} ? %+ : () );

or

my %date = $d =~ m{$fmt1|$fmt2|$fmt3} ? %+ : ();

p5pRT commented 11 years ago

From @cpansprout

On Mon Jul 01 22​:49​:20 2013\, demerphq wrote​:

My view is Perl has regex modifiers and its too late to argue about it anymore.

I actually agree with you on that last point. I had just resigned myself to the fact that ‘everybody’ wants longer flag names.

--

Father Chrysostomos