Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.96k stars 555 forks source link

$1 not localized when calling sub #16337

Open p5pRT opened 6 years ago

p5pRT commented 6 years ago

Migrated from rt.perl.org#132647 (status was 'open')

Searchable as RT132647$

p5pRT commented 6 years ago

From @jimav

Created by @jimav

This is a bug report for perl from jim.avera@​gmail.com\, generated with the help of perlbug 1.40 running under perl 5.26.0.

----------------------------------------------------------------- If $1 is passed as an arg to a function\, and that function internally performs a regex match\, then the argument seen from inside the func is corrupted.

I'm guessing this is because the localization of $1 does not also localize aliases to $1 such as in @​_. This is a nasty trap\, and it would be great if perl could at least diagnose it if it happens (the passed-in $1 is\, after all\, nominally read-only and a direct assignment results in a a fatal "Modification of a read-only value attempted"; so one could argue that any operation which similarly could modify that argument should be flagged as well).

If not fixable or catchable\, then I'd like to suggest adding an explicit mention of this trap to the docs\, e.g. «perlsub».

#!/usr/bin/perl use strict; use warnings;

sub func($) {   my $saved = $_[0];   if ($_[0] =~ /(\d+)/) { }   warn "\$_[0] MUTATED from '$saved' to '$_[0]'\n"   if $_[0] ne $saved; }

func "a123b"; if ("c456d" =~ /(.*)/) { func($1) }

Perl Info ``` Flags: category=core severity=low Site configuration information for perl 5.26.0: Configured by Debian Project at Fri Sep 15 16:13:42 UTC 2017. Summary of my perl5 (revision 5 version 26 subversion 0) configuration: Platform: osname=linux osvers=4.9.0 archname=x86_64-linux-gnu-thread-multi uname='linux localhost 4.9.0 #1 smp debian 4.9.0 x86_64 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dcc=x86_64-linux-gnu-gcc -Dcpp=x86_64-linux-gnu-cpp -Dld=x86_64-linux-gnu-gcc -Dccflags=-DDEBIAN -Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -fdebug-prefix-map=/build/perl-4JQEGJ/perl-5.26.0=. -fstack-protector-strong -Wformat -Werror=format-security -Dldflags= -Wl,-Bsymbolic-functions -Wl,-z,relro -Dlddlflags=-shared -Wl,-Bsymbolic-functions -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.26 -Darchlib=/usr/lib/x86_64-linux-gnu/perl/5.26 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/x86_64-linux-gnu/perl5/5.26 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.26.0 -Dsitearch=/usr/local/lib/x86_64-linux-gnu/perl/5.26.0 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -dEs -Duseshrplib -Dlibperl=libperl.so.5.26.0' hint=recommended useposix=true d_sigaction=define useithreads=define usemultiplicity=define use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n default_inc_excludes_dot=define bincompat5005=undef Compiler: cc='x86_64-linux-gnu-gcc' ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' optimize='-O2 -g' cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include' ccversion='' gccversion='7.2.0' gccosandvers='' intsize=4 longsize=8 ptrsize=8 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=3 ivtype='long' ivsize=8 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries: ld='x86_64-linux-gnu-gcc' ldflags =' -fstack-protector-strong -L/usr/local/lib' libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=libc-2.26.so so=so useshrplib=true libperl=libperl.so.5.26 gnulibc_version='2.26' Dynamic Linking: dlsrc=dl_dlopen.xs dlext=so d_dlsymun=undef ccdlflags='-Wl,-E' cccdlflags='-fPIC' lddlflags='-shared -L/usr/local/lib -fstack-protector-strong' Locally applied patches: DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN. DEBPKG:debian/db_file_ver - https://bugs.debian.org/340047 Remove overly restrictive DB_File version check. DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information. DEBPKG:debian/enc2xs_inc - https://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories. DEBPKG:debian/errno_ver - https://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes. DEBPKG:debian/libperl_embed_doc - https://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking DEBPKG:fixes/respect_umask - Respect umask during installation DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories DEBPKG:debian/extutils_set_libperl_path - EU:MM: set location of libperl.a under /usr/lib DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets. DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor. DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy. DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable. DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian DEBPKG:debian/prune_libs - https://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need. DEBPKG:debian/perlivp - https://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local DEBPKG:debian/deprecate-with-apt - https://bugs.debian.org/747628 Point users to Debian packages of deprecated core modules DEBPKG:debian/squelch-locale-warnings - https://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts DEBPKG:debian/skip-upstream-git-tests - Skip tests specific to the upstream Git repository DEBPKG:debian/patchlevel - https://bugs.debian.org/567489 List packaged patches for 5.26.0-8ubuntu1 in patchlevel.h DEBPKG:fixes/document_makemaker_ccflags - https://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags} DEBPKG:debian/find_html2text - https://bugs.debian.org/640479 Configure CPAN::Distribution with correct name of html2text DEBPKG:debian/perl5db-x-terminal-emulator.patch - https://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl DEBPKG:debian/cpan-missing-site-dirs - https://bugs.debian.org/688842 Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable DEBPKG:fixes/memoize_storable_nstore - [rt.cpan.org #77790] https://bugs.debian.org/587650 Memoize::Storable: respect 'nstore' option not respected DEBPKG:debian/regen-skip - Skip a regeneration check in unrelated git repositories DEBPKG:debian/makemaker-pasthru - https://bugs.debian.org/758471 Pass LD settings through to subdirectories DEBPKG:debian/makemaker-manext - https://bugs.debian.org/247370 Make EU::MakeMaker honour MANnEXT settings in generated manpage headers DEBPKG:debian/kfreebsd-softupdates - https://bugs.debian.org/796798 Work around Debian Bug#796798 DEBPKG:fixes/autodie-scope - https://bugs.debian.org/798096 Fix a scoping issue with "no autodie" and the "system" sub DEBPKG:fixes/memoize-pod - [rt.cpan.org #89441] Fix POD errors in Memoize DEBPKG:debian/hurd-softupdates - https://bugs.debian.org/822735 Fix t/op/stat.t failures on hurd DEBPKG:fixes/math_complex_doc_great_circle - https://bugs.debian.org/697567 [rt.cpan.org #114104] Math::Trig: clarify definition of great_circle_midpoint DEBPKG:fixes/math_complex_doc_see_also - https://bugs.debian.org/697568 [rt.cpan.org #114105] Math::Trig: add missing SEE ALSO DEBPKG:fixes/math_complex_doc_angle_units - https://bugs.debian.org/731505 [rt.cpan.org #114106] Math::Trig: document angle units DEBPKG:fixes/cpan_web_link - https://bugs.debian.org/367291 CPAN: Add link to main CPAN web site DEBPKG:fixes/time_piece_doc - https://bugs.debian.org/817925 Time::Piece: Improve documentation for add_months and add_years DEBPKG:fixes/extutils_makemaker_reproducible - https://bugs.debian.org/835815 https://bugs.debian.org/834190 Make perllocal.pod files reproducible DEBPKG:fixes/file_path_hurd_errno - File-Path: Fix test failure in Hurd due to hard-coded ENOENT DEBPKG:debian/hppa_op_optimize_workaround - https://bugs.debian.org/838613 Temporarily lower the optimization of op.c on hppa due to gcc-6 problems DEBPKG:debian/installman-utf8 - https://bugs.debian.org/840211 Generate man pages with UTF-8 characters DEBPKG:fixes/file_path_chmod_race - https://bugs.debian.org/863870 [rt.cpan.org #121951] Prevent directory chmod race attack. DEBPKG:fixes/extutils_file_path_compat - Correct the order of tests of chmod(). (#294) DEBPKG:fixes/getopt-long-2 - [rt.cpan.org #120300] Withdraw part of commit 5d9947fb445327c7299d8beb009d609bc70066c0, which tries to implement more GNU getopt_long campatibility. GNU DEBPKG:fixes/getopt-long-3 - provide a default value for optional arguments DEBPKG:fixes/getopt-long-4 - https://bugs.debian.org/864544 [rt.cpan.org #122068] Fix issue #122068. DEBPKG:fixes/fbm-instr-crash - [bb152a4] [perl #131575] don't call Perl_fbm_instr() with negative length DEBPKG:fixes/test-builder-reset - https://bugs.debian.org/865894 Reset inside subtest maintains parent DEBPKG:debian/CVE-2016-1238/base-pm-amends-pt2 - [a77da41] Limit dotless-INC effect on base.pm with guard: DEBPKG:debian/hppa_opmini_optimize_workaround - https://bugs.debian.org/869122 Lower the optimization level of opmini.c on hppa DEBPKG:debian/sh4_op_optimize_workaround - https://bugs.debian.org/869373 Also lower the optimization level of op.c and opmini.c on sh4 DEBPKG:fixes/json-pp-example - [rt.cpan.org #92793] https://bugs.debian.org/871837 fix RT-92793: bug in SYNOPSIS DEBPKG:debian/customized - Update customized.dat for files patched in Debian DEBPKG:fixes/CVE-2017-12837 - https://bugs.debian.org/875596 [perl #131582] [66288bb] regcomp [perl #131582] DEBPKG:fixes/CVE-2017-12883 - https://bugs.debian.org/875597 [perl #131598] [2692dda] PATCH: [perl #131598] @INC for perl 5.26.0: /home/jima/lib/perl /home/jima/perl5/lib/perl5/x86_64-linux-gnu-thread-multi /home/jima/perl5/lib/perl5/5.26.0/x86_64-linux-gnu-thread-multi /home/jima/perl5/lib/perl5/5.26.0 /home/jima/perl5/lib/perl5/x86_64-linux-gnu-thread-multi /home/jima/perl5/lib/perl5 /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.26.0 /usr/local/share/perl/5.26.0 /usr/lib/x86_64-linux-gnu/perl5/5.26 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.26 /usr/share/perl/5.26 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base Environment for perl 5.26.0: HOME=/home/jima LANG=en_US.UTF-8 LANGUAGE (unset) LC_COLLATE=C LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/jima/perl5/bin:/home/jima/bin:/home/jima/jima_tools/x86_64/bin:/home/jima/jima_tools/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/bin/X11:/usr/local/bin:/usr/local/sbin:/usr/games:/usr/local/games:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/db/bin:/usr/lib/jvm/java-8-oracle/jre/bin:. PERL5LIB=/home/jima/lib/perl:/home/jima/perl5/lib/perl5/x86_64-linux-gnu-thread-multi:/home/jima/perl5/lib/perl5 PERL_BADLANG (unset) PERL_LOCAL_LIB_ROOT=/home/jima/perl5 PERL_MB_OPT=--install_base /home/jima/perl5 PERL_MM_OPT=INSTALL_BASE=/home/jima/perl5 SHELL=/bin/bash ```
p5pRT commented 6 years ago

From @iabyn

On Sat\, Dec 23\, 2017 at 12​:05​:35PM -0800\, via RT wrote​:

If $1 is passed as an arg to a function\, and that function internally performs a regex match\, then the argument seen from inside the func is corrupted.

I'm guessing this is because the localization of $1 does not also localize aliases to $1 such as in @​_. This is a nasty trap\, and it would be great if perl could at least diagnose it if it happens (the passed-in $1 is\, after all\, nominally read-only and a direct assignment results in a a fatal "Modification of a read-only value attempted"; so one could argue that any operation which similarly could modify that argument should be flagged as well).

$1 et al act like tied variables​: whenever their value is retrieved\, they are set to a value from the current match. They are not scoped or localised\, but the match object is.

This should explain all the behaviour you see.

If not fixable or catchable\, then I'd like to suggest adding an explicit mention of this trap to the docs\, e.g. «perlsub».

I can't see any sane way to fix this without introducing weird special-cased behaviour\, e\,g. turning every bare $N in a function call's args into a "$N".

I suppose in places where $1 et al could get aliased (such as function calls and maybe foreach) a warning could be emitted\, but that might be noisy. I don't know whether there are valid use cases\, but grepping cpan shows 1400+ distributions matching foo($N\,...)\, although some of the foo's are things like subtr and index.

-- "Strange women lying in ponds distributing swords is no basis for a system of government. Supreme executive power derives from a mandate from the masses\, not from some farcical aquatic ceremony."   -- Dennis\, "Monty Python and the Holy Grail"

p5pRT commented 6 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 6 years ago

From @jimav

On 12/28/17 3​:40 AM\, Dave Mitchell via RT wrote​:

On Sat\, Dec 23\, 2017 at 12​:05​:35PM -0800\, via RT wrote​:

If $1 is passed as an arg to a function\, and that function internally performs a regex match\, then the argument seen from inside the func is corrupted I can't see any sane way to fix this without introducing weird special-cased behaviour\, e\,g. turning every bare $N in a function call's args into a "$N"

It sounds like there's likely nothing to do about it\, and making args into "$1" etc would copy data.  I realize now that since the match object doesn't contain matched data\, there is no way to make $1 a real alias because there's nothing to alias it to.

But I can think of a non-trivial solution...

Replace references to $N\, %+ and related vars when they appear in sub/foreach args with a dynamically-created object which decorates the normal thingie with a check that the current match result is the same one which was current when the arg was created; if not\, it would  throw an error "reference to no-longer-current match result".  I suppose there might exist code which actually wants a $N passed as an arg to reference a to-be-created-in-the-future match result.

-Jim

p5pRT commented 6 years ago

From @demerphq

On 28 Dec 2017 12​:41\, "Dave Mitchell" \davem@​iabyn\.com wrote​:

On Sat\, Dec 23\, 2017 at 12​:05​:35PM -0800\, via RT wrote​:

If $1 is passed as an arg to a function\, and that function internally performs a regex match\, then the argument seen from inside the func is corrupted.

I'm guessing this is because the localization of $1 does not also localize aliases to $1 such as in @​_. This is a nasty trap\, and it would be great if perl could at least diagnose it if it happens (the passed-in $1 is\, after all\, nominally read-only and a direct assignment results in a a fatal "Modification of a read-only value attempted"; so one could argue that any operation which similarly could modify that argument should be flagged as well).

$1 et al act like tied variables​: whenever their value is retrieved\, they are set to a value from the current match. They are not scoped or localised\, but the match object is.

This should explain all the behaviour you see.

We'll\, that combined with the fact that perl is a pass by alias language.

The op seems to expect pass by value semantics which is simply a fundamental misunderstanding of how Perls @​_ works.

If not fixable or catchable\, then I'd like to suggest adding an explicit mention of this trap to the docs\, e.g. «perlsub».

I can't see any sane way to fix this without introducing weird special-cased behaviour\, e\,g. turning every bare $N in a function call's args into a "$N".

I don't think there is anything to fix. Newcomers to perl encounter this at some point\, then learn not to do this\, either by copying regex vars early or by explicitly copying the vars as arguments by double quoting them. You will find this issue raised countless times on perlmonks.

I suppose in places where $1 et al could get aliased (such as function calls and maybe foreach) a warning could be emitted\, but that might be noisy. I don't know whether there are valid use cases\, but grepping cpan shows 1400+ distributions matching foo($N\,...)\, although some of the foo's are things like subtr and index.

Are we to do this for every tied object? How are we to know which are volatile?

A doc patch might be in order but imo no more\, vars like $! and $1 are volatile\, it is the programmers responsibility to copy them to non volatile storage or suffer the consequences.

Yves

p5pRT commented 6 years ago

From @jimav

On 12/29/17 1​:07 AM\, yves orton via RT wrote​:

the fact that perl is a pass by alias languag The op seems to expect pass by value semantics which is simply a fundamental misunderstanding of how Perls @​_ works

Not exactly.   Unlike almost anything else in Perl\, if $1 is passed as an argument\, it is not an alias to the caller's match result -- it is more like a _name_ which is effectively eval'd inside the sub each time it is referenced.  If a parameter bound to $1 actually aliased the captured text\, it would still refer to that text after another match result was lexically pushed inside the sub.

Consider this analogy​:

  sub func($) { local $_ = "bar"; print "func called with $_[0]\n"; }   $_ = "foo";   func($_);

No sane programmer would expect the function to print "bar"\, and it doesn't. The arg aliases $_\, but the alias points the _data_ not the name "$_".

But this​:

  sub func($) { "bar" =~ /(.*)/; print "func called with $_[0]\n"; }   "foo" =~ /(.*)/;   func($1);

does what no sane programmer would expect\, i.e.\, it prints "bar".

Now I think Dave Mitchell's idea of converting $1 to "$1" when passed as a sub arg might really be a Good Thing.  In an ideal universe Perl might allow aliases which refer to a _substring_\, and then $1 could really refer directly to the captured text.  But it can't\, so making a copy when passed as a sub arg is good enough to shield the sub's code from having to be aware of this issue. Bear in mind that $1 can't be used as an lvalue anyway\, so neither can $_[n] if it aliases $1; so pass-by-value in this case should be invisible.

p5pRT commented 6 years ago

From @cpansprout

On Fri\, 29 Dec 2017 13​:35​:53 -0800\, jim.avera@​gmail.com wrote​:

In an ideal universe Perl might allow aliases which refer to a _substring_\,

Perl already does that with the return value from substr(). This works​:

$_ = "HELO"; for (substr $_\, 1\, 1) {   $_ = "EL"; }

except that the special substr scalar does get its own copy of the substring internally\, which is unavoidable due to the requirement that string buffers end in a null.

and then $1 could really refer directly to the captured text.

I wondered for a moment why $1 could not be like a substr scalar\, but then I realized​: you can modify the original string and $1 does not change.

However\, $1 currently *does* retrieve its string value dynamically from the pre-match copy\, which because of COW is usually the original string buffer. (But\, again\, because of null-termination\, $1 does get its own copy of the string buffer when you use it.)

But it can't\, so making a copy when passed as a sub arg is good enough to shield the sub's code from having to be aware of this issue. Bear in mind that $1 can't be used as an lvalue anyway\, so neither can $_[n] if it aliases $1; so pass-by-value in this case should be invisible.

It wouldn’t be invisible\, as referential identity would be lost. It might break a lot of introspection code.

--

Father Chrysostomos

p5pRT commented 6 years ago

From @demerphq

On 29 December 2017 at 22​:35\, Jim Avera \jim\.avera@​gmail\.com wrote​:

On 12/29/17 1​:07 AM\, yves orton via RT wrote​:

the fact that perl is a pass by alias languag The op seems to expect pass by value semantics which is simply a fundamental misunderstanding of how Perls @​_ works

Not exactly. Unlike almost anything else in Perl\, if $1 is passed as an argument\, it is not an alias to the caller's match result -- it is more like a _name_ which is effectively eval'd inside the sub each time it is referenced. If a parameter bound to $1 actually aliased the captured text\, it would still refer to that text after another match result was lexically pushed inside the sub.

Consider this analogy​:

sub func($) { local $_ = "bar"; print "func called with $_[0]\n"; } $_ = "foo"; func($_);

No sane programmer would expect the function to print "bar"\, and it doesn't.

This is not the same thing. Local changes what SV an identifier resolves to\, it does not change the SV itself\, and it does not interfere with any refs or alias to other versions.

  local $foo= "bar";   my $bar_ref= \$foo;   local $foo= "baz";   print $$bar_ref;

prints out "bar" as I would expect. $_[0] in the case you showed is still an alias to whatever SV $_ was pointing at in the first place. Localization did not modify that var in any way.

Put another way\, after the local call there are *two* SV's in existence. With the regex case there is only one\, $1.

The arg aliases $_\, but the alias points the _data_ not the name "$_".

It points at the _container_ SV. _data_ implies that it is a value\, it is not\, it is a container.

It is not uncommon for even experienced Perl programmers to conflate values and containers when discussing scalars. 1 is a value. $x is a scalar container which may contain 1.

Aliasing occurs at the *container* level.

But this​:

sub func($) { "bar" =~ /(.*)/; print "func called with $_[0]\n"; } "foo" =~ /(.*)/; func($1);

does what no sane programmer would expect\, i.e.\, it prints "bar".

It does what every experienced Perl programmer would expect.

$1 is a container which when used as an rvalue returns the value of the most recent successful match in scope.

func($1)

calls func and puts an alias to the container $1 into $_[0].

You could just as easily have said​:

  func("$1")

and created a copy. Or you could have written func() like this​:

  sub func { my $thing= shift; "bar" =~ /(.*)/; print "func called with $thing\n"; }

and created a copy inside the func instead.

This is a standard issue with aliasing. Because @​_ contains aliases to the arguments\, it is potentially volatile\, and if this breaks your expectations then you should make a copy.

Here is an example which I consider to be exactly equivalent to the issue with $1 and @​_\, but which uses no regex magic. IMO it is very clear that how it behaves is by design and that none of this is a bug\, no matter how surprising you might consider it to be​:

  sub othersub { $_[0]->{x}++ }   sub whatever {   my ($value\,$hashref)= @​_;   print "$value​:$_[0]";   othersub($hashref);   print "$value​:$_[0]";   $_[0]+=2;   }   my %hash=(x=>1);   whatever($hash{x}\,\%hash);   print $hash{x};

which prints out​:

1​:1 1​:2 4

Which to me is no different from the regex case.

So to me this thread is basically the result of mistaken assumptions about how aliasing works and how regex magic variables work. We can improve the docs to explain this stuff better\, but I strongly feel there is no bug here\, and the best we can do is improve how we educate people about the subtleties.

I mean\, a simple rule is​:

Operating on @​_ directly has subtle implications which may surprise the unwary or inexperienced. Copying the arguments as early as possible ensures that many of these traps are avoided\, and should be general practice. In particular the programmer should remember that any argument in @​_ could be volatile\, and operations performed by the subroutine may result in the arguments changing value between the time of entry to the subroutine and the time of access of the variable. When in doubt copy early.

cheers\, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 6 years ago

From @jimav

On 12/30/17 4​:25 AM\, yves orton via RT wrote​:

The programmer should remember that any argument in @​_ could be volatile\, and operations performed by the subroutine may result in the arguments changing value...

Thanks\, I understand what you are saying.  But a programmer should only need to worry about "operations" which a) refer to sub arguments\, or b) use dynamic variables which have not been first first localized within the sub.  I don't think application programmers should have to defend against weird tied vars or equivalent which break normal localization semantics.

The key point is that perlvar says "These variables are read-only and dynamically-scoped".

As you mentioned\, $1 is not a normal variable but "is a container which...returns the value of the most recent successful match in scope".  And I think that is the crux of the problem​: It does not behave like "dynamically scoped" variables elsewhere in Perl.

If $1 were implicitly localized in scopes containing a regex match\, but otherwise behaved normally\, then passing $1 to a sub would create an alias to an SV and inside the sub $1 would\, after being localized\, point to a different SV.  There would be no trap.

In reality\, only the _data_ of match results is dynamically scoped\, magically\, behind the scenes.  The variables used to get at that data are effectively crippled so that localization has no effect (even an explicit "local $1" does nothing).

perl can not localize $1 as long as captured text is not actually stored anywhere as such.  That's efficient\, but feels like a semantic wart.

If this behavior isn't changed\, then\, perhaps the docs could be modified along these lines​:

  \ could say "Capture result data is read-only and dynamically-scoped.  However the variables $1\, %+ et. al. are magical and can not be localized; these variables I\ always return the most recent successful match results which are in scope at the point of reference.   For example\, if you pass $1 as an argument to a sub\, then the sub must copy $_[n] before performing its own regex match in order to see the caller's intended argument.

 \ might say "[as-is​: Capture group contents are dynamically scoped and available.. . [added​:]However "$1" and related variables\, and any aliases for them\, are can not be localized and always refer to the most recent successful match which is in scope at the point where the variable or alias is referenced."