Open p5pRT opened 11 years ago
perldoc split says: "when assigning to a list\, if LIMIT is omitted (or zero)\, then LIMIT is treated as though it were one larger than the number of variables in the list".
Apparently\, this is not the case within a do{} or eval{} block:
use Data::Dumper;
# implicit limit = 3 in accordance with the docs # -> empty trailing field preserved ($a\, $b) = split /\,/\, 'a\,'; print Dumper($b); # empty string
# implicit limit = 0 contrary to the docs # -> empty trailing field stripped ($a\, $b) = do { split /\,/\, 'a\,' }; print Dumper($b); # undef
($a\, $b) = eval { split /\,/\, 'a\,' }; print Dumper($b); # undef
__END__
Flags: category=core severity=low
This perlbug was built using Perl 5.14.4 in the Fedora build system. It is being executed now by Perl 5.14.4 - Thu Apr 11 12:56:04 UTC 2013.
Site configuration information for perl 5.14.4:
Configured by Red Hat\, Inc. at Thu Apr 11 12:56:04 UTC 2013.
Summary of my perl5 (revision 5 version 14 subversion 4) configuration:
Platform: osname=linux\, osvers=2.6.32-358.2.1.el6.x86_64\, archname=x86_64-linux-thread-multi uname='linux buildvm-11.phx2.fedoraproject.org 2.6.32-358.2.1.el6.x86_64 #1 smp wed feb 20 12:17:37 est 2013 x86_64 x86_64 x86_64 gnulinux ' config_args='-des -Doptimize=-O2 -g -pipe -Wall -Wp\,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -Dccdlflags=-Wl\,--enable-new-dtags -Dlddlflags=-shared -O2 -g -pipe -Wall -Wp\,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -Wl\,-z\,relro -DDEBUGGING=-g -Dversion=5.14.4 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat\, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5 -Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin' hint=recommended\, useposix=true\, d_sigaction=define useithreads=define\, usemultiplicity=define useperlio=define\, d_sfio=undef\, uselargefiles=define\, usesocks=undef use64bitint=define\, use64bitall=define\, uselongdouble=undef usemymalloc=n\, bincompat5005=undef Compiler: cc='gcc'\, ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'\, optimize='-O2 -g -pipe -Wall -Wp\,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic'\, cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion=''\, gccversion='4.7.2 20120921 (Red Hat 4.7.2-2)'\, gccosandvers='' intsize=4\, longsize=8\, ptrsize=8\, doublesize=8\, byteorder=12345678 d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=16 ivtype='long'\, ivsize=8\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=8 alignbytes=8\, prototype=define Linker and Libraries: ld='gcc'\, ldflags =' -fstack-protector' libpth=/usr/local/lib64 /lib64 /usr/lib64 libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc libc=\, so=so\, useshrplib=true\, libperl=libperl.so gnulibc_version='2.15' Dynamic Linking: dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-Wl\,--enable-new-dtags -Wl\,-rpath\,/usr/lib64/perl5/CORE' cccdlflags='-fPIC'\, lddlflags='-shared -O2 -g -pipe -Wall -Wp\,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -Wl\,-z\,relro '
Locally applied patches:
@INC for perl 5.14.4: /home/xxx/src/perl /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .
Environment for perl 5.14.4: HOME=/home/xxx LANG=en_US.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset)
PATH=/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/xxx/.local/bin:/home/xxx/bin PERL5LIB=/home/xxx/src/perl PERL_BADLANG (unset) SHELL=/bin/bash
On Thu Jul 11 11:35:33 2013\, galama wrote:
perldoc split says: "when assigning to a list\, if LIMIT is omitted (or zero)\, then LIMIT is treated as though it were one larger than the number of variables in the list".
Apparently\, this is not the case within a do{} or eval{} block:
use Data::Dumper;
# implicit limit = 3 in accordance with the docs # -> empty trailing field preserved ($a\, $b) = split /\,/\, 'a\,'; print Dumper($b); # empty string
# implicit limit = 0 contrary to the docs # -> empty trailing field stripped ($a\, $b) = do { split /\,/\, 'a\,' }; print Dumper($b); # undef
($a\, $b) = eval { split /\,/\, 'a\,' }; print Dumper($b); # undef
__END__
Confirmatory code:
$ perl -MData::Dumper -E '$Data::Dumper::Indent=0;($a\, $b) = split /\,/\, "a\,"; say Dumper ([$a\,$b]);' $VAR1 = ['a'\,''];
$ perl -MData::Dumper -E '$Data::Dumper::Indent=0;($a\, $b) = do { split /\,/\, "a\," }; say Dumper ([$a\,$b]);' $VAR1 = ['a'\,undef];
$ perl -MData::Dumper -E '$Data::Dumper::Indent=0;($a\, $b) = eval { split /\,/\, "a\," }; say Dumper ([$a\,$b]);' $VAR1 = ['a'\,undef];
The RT System itself - Status changed from 'new' to 'open'
* Anand Galama \perlbug\-followup@​perl\.org [2013-07-11T14:35:33]
perldoc split says: "when assigning to a list\, if LIMIT is omitted (or zero)\, then LIMIT is treated as though it were one larger than the number of variables in the list".
Apparently\, this is not the case within a do{} or eval{} block:
I imagine the easiest and more appropriate fix is to say "when immediately assigning" rather thant o try to address every form of indirection.
-- rjbs
On Sun Jul 14 05:44:46 2013\, perl.p5p@rjbs.manxome.org wrote:
* Anand Galama \perlbug\-followup@​perl\.org [2013-07-11T14:35:33]
perldoc split says: "when assigning to a list\, if LIMIT is omitted (or zero)\, then LIMIT is treated as though it were one larger than the number of variables in the list".
Apparently\, this is not the case within a do{} or eval{} block:
I imagine the easiest and more appropriate fix is to say "when immediately assigning" rather thant o try to address every form of indirection.
I always thought this was just an optimisation and it was only documented so that one could intentionally trigger it.
But as the OP points out\, it can change undef into an empty string.
Do we want to make it so that this optimisation is purely an optimisation?
--
Father Chrysostomos
On Sun Jul 14 05:44:46 2013\, perl.p5p@rjbs.manxome.org wrote:
I imagine the easiest and more appropriate fix is to say "when immediately assigning" rather thant o try to address every form of indirection.
Thanks\, this just lead me to another even simpler example. :-)
($n\, $a\, $b) = (1\, split /\,/\, 'a\,');
This also makes $b undef just like in the do{} and eval{} case but I would still call it an "immediate" assignment.
* Father Chrysostomos via RT \perlbug\-followup@​perl\.org [2013-07-14 16:00]:
Do we want to make it so that this optimisation is purely an optimisation?
I think it should be.
But how to go about it? Do you want to expose this capability in some way so the user can invoke it explicitly? If not\, would you remove the paragraph about optimisation from perlfunc?
This is also a very wide-spread usage\, undoubtedly relied on unwittingly by a *lot* of code. How to detect that? Regular deprecation does not seem the answer\, since this deprecation warning would pour forth from perl like floodwaters.
Off the cuff the only thing I can think of is for `split` to return a special magicalised empty string during the deprecation cycle\, which would trigger the deprecation warning when used in⊠certain contexts. (Non-boolean?)
(No doubt this would also clean up code that is inadvertently buggy though.)
* Anand Galama via RT \perlbug\-followup@​perl\.org [2013-07-15 05:55]:
($n\, $a\, $b) = (1\, split /\,/\, 'a\,');
This also makes $b undef just like in the do{} and eval{} case but I would still call it an "immediate" assignment.
Does this do it for you?
* Aristotle Pagaltzis \pagaltzis@​gmx\.de [2013-07-16 03:00]:
(No doubt this would also clean up code that is inadvertently buggy though.)
(By which I mean code thatâs currently written to expect it will get undef for missing trailing fields\, but triggers the optimisation and therefore doesnât.)
On Mon Jul 15 17:58:37 2013\, aristotle wrote:
* Father Chrysostomos via RT \perlbug\-followup@​perl\.org [2013-07-14 16:00]:
Do we want to make it so that this optimisation is purely an optimisation?
I think it should be.
But how to go about it? Do you want to expose this capability in some way so the user can invoke it explicitly?
An explicit third argument does that\, does it not?
If not\, would you remove the paragraph about optimisation from perlfunc?
Instead of explaining how the optimisation works\, just state the circumstance in which it occurs.
Then we can fix #103212 as well.
This is also a very wide-spread usage\, undoubtedly relied on unwittingly by a *lot* of code. How to detect that? Regular deprecation does not seem the answer\, since this deprecation warning would pour forth from perl like floodwaters.
Off the cuff the only thing I can think of is for `split` to return a special magicalised empty string during the deprecation cycle\, which would trigger the deprecation warning when used in⊠certain contexts. (Non-boolean?)
If the deprecation warning could be spelt âUse of uninitialized valueâ\, I think we already have such a scalar. :-)
Or do you mean one for which defined($_) returns true\, but which is treated as undef elsewhere? Yes\, that would be possible.
But how much code is really relying on that single empty string?
$ ./perl -Ilib -e '($a\,$b\,$c\,$d\,$e\,$f) = split //\, "abc"; warn "["\,$_//"undef"\,"]" for $a\,$b\,$c\,$d\,$e\,$f' [a] at -e line 1. [b] at -e line 1. [c] at -e line 1. [] at -e line 1. [undef] at -e line 1. [undef] at -e line 1.
(No doubt this would also clean up code that is inadvertently buggy though.)
* Anand Galama via RT \perlbug\-followup@​perl\.org [2013-07-15 05:55]:
($n\, $a\, $b) = (1\, split /\,/\, 'a\,');
This also makes $b undef just like in the do{} and eval{} case but I would still call it an "immediate" assignment.
Does this do it for you?
diff --git i/pod/perlfunc.pod w/pod/perlfunc.pod index 18ecd40..c108733 100644 --- i/pod/perlfunc.pod +++ w/pod/perlfunc.pod @@ -6959\,8 +6959\,9 @@ produces the output 'a:b:c'\, but the following: produces the output 'a:b:c:::'.
In time-critical applications\, it is worthwhile to avoid splitting -into more fields than necessary. Thus\, when assigning to a list\, -if LIMIT is omitted (or zero)\, then LIMIT is treated as though it +into more fields than necessary. Thus\, when the right-hand side of +a list assignment consists only of a C\
\, and LIMIT is omitted +(or zero)\, then LIMIT is treated as though it were one larger than the number of variables in the list; for the following\, LIMIT is implicitly 3:
--
Father Chrysostomos
* Father Chrysostomos via RT \perlbug\-followup@​perl\.org [2013-07-16 03:15]:
On Mon Jul 15 17:58:37 2013\, aristotle wrote:
* Father Chrysostomos via RT \perlbug\-followup@​perl\.org [2013-07-14 16:00]:
Do we want to make it so that this optimisation is purely an optimisation?
I think it should be.
But how to go about it? Do you want to expose this capability in some way so the user can invoke it explicitly?
An explicit third argument does that\, does it not?
No?
Now Iâm not sure weâre even talking about the same thing.
An explicit third argument tells `split` to stop splitting at the given number of fields\, *and* return a trailing empty string field if any.
The user can omit the third argument (and trip up the optimisation) to make `split` *not* stop splitting *and* *not* return any trailing empty string field.
With the current interface there is no way for the user to ask for one of these (stop splitting at a certain field) *but not* the other (return a trailing empty string field).
What I understood you to be asking is whether
perl -E '($n\, $a\, $b) = (1\, split /\,/\, "a\,"); say 1 if defined $b' perl -E ' ($a\, $b) = (split /\,/\, "a\,"); say 1 if defined $b'
should both be silent (where currently the latter isnât) while at the same time the `split` on the second line should know to stop splitting at the third field\, unlike the one on the first line (as is currently true).
In other words\, should `split` be able to do something (stop splitting *but not* return a trailing empty field) that the user canât currently explicitly ask it for?
Now when you deparse `($a\,$b) = split//`\, it actually says this:
($a\, $b) = split(//\, $_\, 3);
If the optimisation were made transparent\, then deparsing the `split` like this would be a lie\, because if the user wrote that\, she would get a subtly different behaviour than if the optimiser makes it like that.
Either the optimisation would have to trigger in some other way (within `pp_split`?) and become invisible at the op tree level â i.e. deparsing that statement would yield this:
($a\, $b) = split(//\, $_);
If that is not the case\, and the optimisation is reflected in the op tree\, then the interface to `split` would have to change such that when the user writes the same code as Deparse outputs\, she gets the exact same behaviour (i.e. it stops splitting *but doesnât* return a trailing empty string).
This is also a very wide-spread usage\, undoubtedly relied on unwittingly by a *lot* of code. How to detect that? Regular deprecation does not seem the answer\, since this deprecation warning would pour forth from perl like floodwaters.
Off the cuff the only thing I can think of is for `split` to return a special magicalised empty string during the deprecation cycle\, which would trigger the deprecation warning when used in⊠certain contexts. (Non-boolean?)
If the deprecation warning could be spelt âUse of uninitialized valueâ\, I think we already have such a scalar. :-)
I am *not* proposing to deprecate the absence of a warning by warning that in the future an undef warning will be emitted. :-P
Or do you mean one for which defined($_) returns true\, but which is treated as undef elsewhere? Yes\, that would be possible.
Yes\, that is what I was thinking of.
Very specifically the case of `defined $trailing_field` is what would need to deprecation-warn\, even if nothing else does\, because that is unambiguously a case that would change behaviour after patching `split`.
Currently that expression returns true\, after the change it would return false â so any conditional that tests definedness would take the other branch in the future. This is a change in the behaviour of user code. If anything requires a deprecation warning\, that does.
Other uses of $trailing_field are murkier in whether they will cause any change in behaviour. I donât know exactly how trigger-happy the warning should be. I am still trying to reason it out methodically\, which is not my strength.
But how much code is really relying on that single empty string?
$ ./perl -Ilib -e '($a\,$b\,$c\,$d\,$e\,$f) = split //\, "abc"; warn "["\,$_//"undef"\,"]" for $a\,$b\,$c\,$d\,$e\,$f' [a] at -e line 1. [b] at -e line 1. [c] at -e line 1. [] at -e line 1. [undef] at -e line 1. [undef] at -e line 1.
I donât know. Iâm not worried about the total amount in and of itself. It is more the sense that the amount of code relying on this might not be entirely trivial\, even if it is small\, along with the sense that most reliance on this behaviour is inadvertent. That is to say\, if we change this\, a small but not asymptotically zero amount of code would change behaviour in hard-to-pinpoint circumstances. That makes it important\, to my mind\, to give users the help and time theyâll need to spot it.
It can also construed a documented and advertised feature if you read the documentation hard enough. Therefore âfixingâ it could be construed as breaking a documented interface.
But I suppose running a CPAN smoke using the transparent optimisation could tell us how much immediately obvious breakage results\, before we think further on this. My worry may be unwarranted?
PS.: it is irritating how much verbiage it takes to talk clearly about these in-and-of-themselves fairly small issues. I would have written a shorter bikeshed\, but I did not have the knowledge how.
Regards\, -- Aristotle Pagaltzis // \<http://plasmasturm.org/>
On Mon Jul 15 17:58:37 2013\, aristotle wrote:
Does this do it for you?
diff --git i/pod/perlfunc.pod w/pod/perlfunc.pod index 18ecd40..c108733 100644 --- i/pod/perlfunc.pod +++ w/pod/perlfunc.pod @@ -6959\,8 +6959\,9 @@ produces the output 'a:b:c'\, but the following: produces the output 'a:b:c:::'.
In time-critical applications\, it is worthwhile to avoid splitting -into more fields than necessary. Thus\, when assigning to a list\, -if LIMIT is omitted (or zero)\, then LIMIT is treated as though it +into more fields than necessary. Thus\, when the right-hand side of +a list assignment consists only of a C\
\, and LIMIT is omitted +(or zero)\, then LIMIT is treated as though it were one larger than the number of variables in the list; for the following\, LIMIT is implicitly 3:
Hmmm:
($a\, $b) = split /\,/\, 'a\,'; # ('a'\, '') as expected ($a\, $b) = @a = split /\,/\, 'a\,'; # also ('a'\, '') (!) although: @a = split /\,/\, 'a\,'; # just ('a')\, as expected
The second one would have to be changed if you change the docs your way.
Interestingly:
($a\, $b) = @a[0\,1] = split /\,/\, 'a\,'; # ('a'\, undef)
I am sure Schrödinger would have called his cat "Split". :-)
On Mon Jul 15 19:59:07 2013\, aristotle wrote:
The user can omit the third argument (and trip up the optimisation) to make `split` *not* stop splitting *and* *not* return any trailing empty string field.
With the current interface there is no way for the user to ask for one of these (stop splitting at a certain field) *but not* the other (return a trailing empty string field).
I see what you are saying. That would require yet some other syntax. I donât see much value in it.
What I understood you to be asking is whether
perl \-E '\($n\, $a\, $b\) = \(1\, split /\,/\, "a\,"\); say 1 if defined $b' perl \-E ' \($a\, $b\) = \(split /\,/\, "a\,"\); say 1 if defined $b'
should both be silent (where currently the latter isnât) while at the same time the `split` on the second line should know to stop splitting at the third field\, unlike the one on the first line (as is currently true).
In other words\, should `split` be able to do something (stop splitting *but not* return a trailing empty field) that the user canât currently explicitly ask it for?
If it is just an optimisation\, then the calling code canât know whether perl finished splitting to the end of the string. (âWrite your code like this\, and you will make it easier for the compiler to optimise things and make them faster.â) So does it matter?
Now when you deparse `($a\,$b) = split//`\, it actually says this:
\($a\, $b\) = split\(//\, $\_\, 3\);
If the optimisation were made transparent\, then deparsing the `split` like this would be a lie\, because if the user wrote that\, she would get a subtly different behaviour than if the optimiser makes it like that.
If we change the behaviour\, then we will obviously have to change the deparsed output. It is as simple as setting a flag on the split op and have B::Deparse check that.
--
Father Chrysostomos
Migrated from rt.perl.org#118879 (status was 'open')
Searchable as RT118879$