Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.98k stars 559 forks source link

Lexical scoping of "my" quite odd in postfix loops #7140

Open p5pRT opened 20 years ago

p5pRT commented 20 years ago

Migrated from rt.perl.org#27517 (status was 'open')

Searchable as RT27517$

p5pRT commented 20 years ago

From @jlokier

Created by @jlokier

Guess what this program prints?

  use strict;   my $x = 0;   my $a = "outer";   my $a = "inner" while ($a++ \< 3);   print "\$a = :$a​:\n";   my $b = "outer";   my $b = "inner" for (1\, 2\, 3);   print "\$b = :$b​:\n";   my $c = "outer";   my $c = "inner" if 1;   print "\$c = :$c​:\n";

Good guess!

  $a = :​:   $b = :​:   $c = :inner​:

The values for $a and $b are very surprising. I'd expect them to _either_ inherrit the value "inner" from the inner lexical binding\, or to be outside the scope of that binding and inherit the value "outer" from the outer binding. But no\, the variables end up with values which they were never assigned.

It's not that surprising to get undef. This must necessarily assign undef to $n even though that value doesn't appear anywhere​:

  my $n = 1 if some_random_condition();

What's surprising is to get undef when the loops do execute at least once.

This code maybe illustrates what is so unexpected​:

  use strict;   my $x = $_**2 for (1..3);   print "$x\n";

The sensible (and DWIM) behaviours I *strongly* expect are $x == 9 _or_ a compile time error. But no\, the code compiles fine\, and prints "\n".

That's too surprising. I suggest changing the semantic to one of the sensible ones\, but if this Perl semantic cannot be changed now\, I suggest a compile-time warning for when variables are bound using "my" inside a loop construct and their scope is visible outside the scope.

Thanks\, -- Jamie

Perl Info ``` Flags: category=core severity=medium Site configuration information for perl v5.8.0: Configured by bhcompile' cf_email='bhcompile at Wed Aug 13 11:45:59 EDT 2003. Summary of my rderl (revision 5.0 version 8 subversion 0) configuration: Platform: osname=linux, osvers=2.4.21-1.1931.2.382.entsmp, archname=i386-linux-thread-multi uname='linux str' config_args='-des -Doptimize=-O2 -g -pipe -march=i386 -mcpu=i686 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Dotherlibdirs=/usr/lib/perl5/5.8.0 -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef' useithreads=define usemultiplicity= useperlio= d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=un uselongdouble= usemymalloc=, bincompat5005=undef Compiler: cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm', optimize='', cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -I/usr/include/gdbm' ccversion='', gccversion='3.2.2 20030222 (Red Hat Linux 3.2.2-5)', gccosandvers='' gccversion='3.2.2 200302' intsize=r, longsize=r, ptrsize=5, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long' k', ivsize=4' ivtype='l, nvtype='double' o_nonbl', nvsize=, Off_t='', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='gcc' l', ldflags =' -L/u' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lgdbm -ldb -ldl -lm -lpthread -lc -lcrypt -lutil perllibs= libc=/lib/libc-2.3.2.so, so=so, useshrplib=true, libperl=libper gnulibc_version='2.3.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so', d_dlsymun=undef, ccdlflags='-rdynamic -Wl,-rpath,/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE' cccdlflags='-fPIC' ccdlflags='-rdynamic -Wl,-rpath,/usr/lib/perl5', lddlflags='s Unicode/Normalize XS/A' Locally applied patches: MAINT18379 @INC for perl v5.8.0: /usr/lib/perl5/5.8.0/i386-linux-thread-multi /usr/lib/perl5/5.8.0 /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.0/i386-linux-thread-multi /usr/lib/perl5/5.8.0 . Environment for perl v5.8.0: HOME=/home/jamie LANG=en_GB.UTF-8 LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/home/jamie/bin PERL_BADLANG (unset) SHELL=/bin/bash dlflags='-share (unset) ```
p5pRT commented 20 years ago

From @rgs

Jamie Lokier wrote in perl.perl5.porters :

Guess what this program prints?

use strict;
my $x = 0;
my $a = "outer";
my $a = "inner" while \($a\+\+ \< 3\);

Using the terms "outer" and "inner" is misleading. There is only one scope here; the statement modifiers don't create any.

The perlsyn manpage documents this bug\, saying that "the behaviour of a C\ statement modified with a statement modifier conditional or loop construct (e.g. C\<my $x if ...>) is B\."

That's too surprising. I suggest changing the semantic to one of the sensible ones\, but if this Perl semantic cannot be changed now\, I suggest a compile-time warning for when variables are bound using "my" inside a loop construct and their scope is visible outside the scope.

The plan is to add a deprecation warning for the cases where is "feature" is abused; notably C\<my $x if 0>; and to change the semantics of this construct in perl 5.12.

p5pRT commented 20 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 20 years ago

From @iabyn

On Tue\, Mar 09\, 2004 at 08​:50​:20AM -0000\, Rafael Garcia-Suarez wrote​:

Jamie Lokier wrote in perl.perl5.porters :

Guess what this program prints?

use strict;
my $x = 0;
my $a = "outer";
my $a = "inner" while \($a\+\+ \< 3\);

Using the terms "outer" and "inner" is misleading. There is only one scope here; the statement modifiers don't create any.

Yes\, if run with 'use warnings'\, you get​:

"my" variable $a masks earlier declaration in same scope at /tmp/p line 7. "my" variable $b masks earlier declaration in same scope at /tmp/p line 10. "my" variable $c masks earlier declaration in same scope at /tmp/p line 13.

-- You live and learn (although usually you just live).

p5pRT commented 20 years ago

From @jlokier

Dave Mitchell via RT wrote​:

use strict;
my $x = 0;
my $a = "outer";
my $a = "inner" while \($a\+\+ \< 3\);

Using the terms "outer" and "inner" is misleading. There is only one scope here; the statement modifiers don't create any.

Yes\, if run with 'use warnings'\, you get​:

"my" variable $a masks earlier declaration in same scope at /tmp/p line 7. "my" variable $b masks earlier declaration in same scope at /tmp/p line 10. "my" variable $c masks earlier declaration in same scope at /tmp/p line 13.

Ok\, but that's not really the point of my bug report. Sorry if that example was misleading.

Please consider this program​:

  use strict;   my $x = 0;   my $a = 1 while ($x++ \< 3);   print $a;

The point is that $a is still visible after the loop\, and is C\ined. I'd expect $a to either have the value 1\, or to not be visible at all and yielding an error from C\ when I try to use it.

-- Jamie

p5pRT commented 20 years ago

From @jlokier

Rafael Garcia-Suarez via RT wrote​:

use strict;
my $x = 0;
my $a = "outer";
my $a = "inner" while \($a\+\+ \< 3\);

Using the terms "outer" and "inner" is misleading. There is only one scope here; the statement modifiers don't create any.

Yes\, I was meaning in the sense that the second $a overrides the earlier one. In effect it creates an inner binding contour - that's just a matter of computer language terminology\, and I apologise for the confusion.

Please ignore that example. The important one is​:

  use strict;   my $x = 1 for (1\, 2\, 3);   $x; # Visible after the loop\, and undefined.

I found this setting $x to undef counterintuitive\, having learned that this sets $x to a defined value​:

  my $x = 1 if 1;   $x; # Visible after the conditional\, and set to 1.

Of course that's nothing technically wrong with the loop semantic\, it's my intuition which is incorrect. The important thing is that code in a program looked quite innocent​: the loop condition was guaranteed to execute once\, and the value _assigned_ to $x could legitimately be undef\, so the code appeared to be working in some tests when in fact it did not.

Hence the suggestion of a warning for this\, or for a scope to be created by the loop modifiers\, or (shock!) for $x to retain the value it was assigned at the last iteration. I would find the latter most intuitive\, and occasionally useful.

The perlsyn manpage documents this bug\, saying that "the behaviour of a C\ statement modified with a statement modifier conditional or loop construct (e.g. C\<my $x if ...>) is B\."

Sneaky. That's easy to forget because it's not something one tries often\, and with "if" it works as I'd expect. I don't know whether other people's expectations are similar to mine. I have used C\<my $x = calculation if ...> several times in real programs.

Unlike\, say\, C\, in Perl if you try something and it works\, and C\<use strict> doesn't complain\, then it's usually fine\, and any rules you may get a feel for about evaluation order or "my" scoping are dependable.

In a language like C it's essential to keep in mind what is defined and undefined and implementation defined\, because it's too easy to write programs with non-deterministic behavior\, which happen to work in the obvious way on one machine at one phase of the moon.

For example\, in C evaluation order in expressions is quite flexible\, function arguments aren't evaluated in order\, C\<x++> should never be used in the same expression twice\, etc. Contrarily\, in Perl you may use C\ several times in an expression and know exactly what it means.

Perl is much more deterministic\, and this allows the language to be learned without memorising everything from the manuals. If an expression works in Perl and depends on specific evaluation order or specific "my" binding scope\, it's highly likely that behaviour is dependable in general.

For example\, C\<my $timeout = 30 + (my $time = time())> is well defined in Perl and makes both variables visible to later statements; C\<if (defined (my $x = calculation)) { ... } else { ... }> makes the variable visible in both arms of the conditional\, but not after it.

These rules aren't obvious from the manual; they are learned by trying things to discover how the language works\, and then assuming that it's consistent about it.

Point being that a mention in the manual isn't half as good as a decent warning\, or C\ complaining\, or semantics which aren't surprising.

End of rant. :)

The plan is to add a deprecation warning for the cases where is "feature" is abused; notably C\<my $x if 0>; and to change the semantics of this construct in perl 5.12.

What will the new semantics be?

I find the current semantic of C\<my $x = calculation if maybe_false> intuitive\, it does exactly what I expect\, and it's useful too. My beef is with the loop ones :)

-- Jamie

p5pRT commented 20 years ago

From @iabyn

On Tue\, Mar 09\, 2004 at 06​:51​:47PM +0000\, Jamie Lokier wrote​:

The plan is to add a deprecation warning for the cases where is "feature" is abused; notably C\<my $x if 0>; and to change the semantics of this construct in perl 5.12.

What will the new semantics be?

  my $x = foo if bar

will be equivalent to​:

  my $x;   $x = foo if bar

except that the scope of $x wont expand\, eg in

  if (my $x = foo) ...

$x still wont be visible outside the loop.

I find the current semantic of C\<my $x = calculation if maybe_false> intuitive\, it does exactly what I expect\, and it's useful too. My beef is with the loop ones :)

I doubt that it currently does what you expect​:

  $ ./perl -le 'for (1..10) { my $x=10 if $_==1;;print $x; $x=20 }'   10

  20   20   20   20   20   20   20   20

Most people find the printing of the value 20 unexpected.

-- Lady Nancy Astor​: If you were my husband\, I would flavour your coffee with poison. Churchill​: Madam - if I were your husband\, I would drink it.

p5pRT commented 20 years ago

From @jlokier

Dave Mitchell wrote​:

my $x = foo if bar

will be equivalent to​:

my $x;
$x = foo if bar

Oh. That's what I thought it was already!

except that the scope of $x wont expand\, eg in

if \(my $x = foo\) \.\.\.

$x still wont be visible outside the loop.

You mean outside the conditional\, I presume. That's nice\, I like it as it is.

By the way\, this is inconsistent with the binding of regex results in $0\, $1\, ...

Regexes matched in a loop have their result bound inside the loop\, but not outside. For example​:

  while (/(.)/) {   # $1 contains first non-newline character of $_\, if any.   }   # $1 contains the value it had before the while.

Yet conditionals bind regex results differently​:

  if (/(.)/) {   # $1 contains first non-newline character of $_\, if any.   }   # $1 contains first non-newline character of $_\, if any.

This binding for conditionals is very useful in practice\, although it's a little counterintuitive that regex result scope is different from "my" scope.

I have occasionally changed an "if (/.../) {...}" to a "while (/.../) {...}"\, and thus introduced a not-so-obvious bug into some code that used the regex result just after the "if".

I find the current semantic of C\<my $x = calculation if maybe_false> intuitive\, it does exactly what I expect\, and it's useful too. My beef is with the loop ones :)

I doubt that it currently does what you expect​:

$ \./perl \-le 'for \(1\.\.10\) \{ my $x=10 if $\_==1;;print $x; $x=20 \}'
10

20
20
20
\[etc\.\.\.\]

Most people find the printing of the value 20 unexpected.

Oh. I'm surprised too\, and I actually use that construct in a few programs. I didn't realise the programs where buggy. That is _really_ broken behaviour as I'm sure you agree.

It looks quite similar to the regex bug with lexicals\, #26909\, which by the way nobody has replied too (perhaps it's too hard :) The similarity is that a reference to a lexical variable is reading a previous value which should be impossible as the dynamic scope has been left and reentered.

So perhaps the fix to this semantic will help with #26909?

By the way\, what happens if $x is assigned an object instead of 20? Does this mean the object isn't destroyed at the end of the scope of the loop?

(Does a quick test).

Yikes\, the object isn't destroyed when its lexical scope if left completely. This is more serious than I thought\, captain.

$ perl -le 'package X; sub DESTROY { print "destroy" } { my $x if 0; $x = bless []\,"X"; } print "finished";'

finished destroy

Get rid of the conditional and it destroys the object when expected​:

$ perl -le 'package X; sub DESTROY { print "destroy" } { my $x; $x = bless []\,"X"; } print "finished";'

destroy finished

It's good news that it's to be fixed! Will that change fix the regex lexical bug #26909?

-- Jamie

p5pRT commented 20 years ago

From @rgs

Jamie Lokier wrote in perl.perl5.porters :

Oh. I'm surprised too\, and I actually use that construct in a few programs. I didn't realise the programs where buggy. That is _really_ broken behaviour as I'm sure you agree.

The current plan is to fix it in 5.12 and deprecate it in 5.10.

It looks quite similar to the regex bug with lexicals\, #26909\, which by the way nobody has replied too (perhaps it's too hard :)

Most strange ; I did reply to this bug\, (mostly to say\, "it's not a bug\, and this emits a warnings now")\, but for some reason RT doesn't have it.

http​://groups.google.com/groups?threadm=20040221225833.6d1dc31c.rgarciasuarez%40free.fr

p5pRT commented 20 years ago

From @iabyn

On Wed\, Mar 10\, 2004 at 04​:41​:39AM +0000\, Jamie Lokier wrote​:

By the way\, this is inconsistent with the binding of regex results in $0\, $1\, ...

Regexes matched in a loop have their result bound inside the loop\, but not outside. For example​:

while \(/\(\.\)/\) \{
    \# $1 contains first non\-newline character of $\_\, if any\.
\}
\# $1 contains the value it had before the while\.

Yet conditionals bind regex results differently​:

if \(/\(\.\)/\) \{
    \# $1 contains first non\-newline character of $\_\, if any\.
\}
\# $1 contains first non\-newline character of $\_\, if any\.

This binding for conditionals is very useful in practice\, although it's a little counterintuitive that regex result scope is different from "my" scope.

I have occasionally changed an "if (/.../) {...}" to a "while (/.../) {...}"\, and thus introduced a not-so-obvious bug into some code that used the regex result just after the "if".

Well\, if (X){Y} deliberately has differnt scope semantics to while(X){Y}; the loop operators introduce a new scope\, but the if() is treated the same as X && do {Y}\, ie the X isn't in a new scope.

Except...

Its broken as regards my delcarations within the conditional​:

  sub X​::DESTROY { print "DESTROY\n" }   $x = "global";   {   if (my $x = bless []\, 'X') {   print "inner​: $x\n";   }   print "middle​: $x\n";   }   print "outer​: $x\n";

which outputs​:

  inner​: X=ARRAY(0x817e000)   middle​: global   DESTROY   outer​: global

There is a difference between compile-time and run-time scope here. At runtime\, the if conditional does not introduce a new scope\, so the lexical isn't freed until the middle section is exited; but at compile-time\, a new scope *is* introduced\, so the middle print sees the global $x rather than the lexical one.

This inconsistency smells like a bug to me. Which behaviour is wrong\, runtime or compile-time\, is open to debate\, but personally I think it's the compile-time that's wrong. I don't think its worth fixing though\, because the backwards compatibility outweighs the benefits (I think).

Dave.

-- A power surge on the Bridge is rapidly and correctly diagnosed as a faulty capacitor by the highly-trained and competent engineering staff.   -- Things That Never Happen in "Star Trek" #9

swade1987 commented 4 years ago

Issues go stale after 60d of inactivity.