Open p5pRT opened 6 years ago
This is a bug report for perl from jim.avera@gmail.com\, generated with the help of perlbug 1.40 running under perl 5.26.0.
----------------------------------------------------------------- If $1 is passed as an arg to a function\, and that function internally performs a regex match\, then the argument seen from inside the func is corrupted.
I'm guessing this is because the localization of $1 does not also localize aliases to $1 such as in @_. This is a nasty trap\, and it would be great if perl could at least diagnose it if it happens (the passed-in $1 is\, after all\, nominally read-only and a direct assignment results in a a fatal "Modification of a read-only value attempted"; so one could argue that any operation which similarly could modify that argument should be flagged as well).
If not fixable or catchable\, then I'd like to suggest adding an explicit mention of this trap to the docs\, e.g. «perlsub».
#!/usr/bin/perl use strict; use warnings;
sub func($) { my $saved = $_[0]; if ($_[0] =~ /(\d+)/) { } warn "\$_[0] MUTATED from '$saved' to '$_[0]'\n" if $_[0] ne $saved; }
func "a123b"; if ("c456d" =~ /(.*)/) { func($1) }
On Sat\, Dec 23\, 2017 at 12:05:35PM -0800\, via RT wrote:
If $1 is passed as an arg to a function\, and that function internally performs a regex match\, then the argument seen from inside the func is corrupted.
I'm guessing this is because the localization of $1 does not also localize aliases to $1 such as in @_. This is a nasty trap\, and it would be great if perl could at least diagnose it if it happens (the passed-in $1 is\, after all\, nominally read-only and a direct assignment results in a a fatal "Modification of a read-only value attempted"; so one could argue that any operation which similarly could modify that argument should be flagged as well).
$1 et al act like tied variables: whenever their value is retrieved\, they are set to a value from the current match. They are not scoped or localised\, but the match object is.
This should explain all the behaviour you see.
If not fixable or catchable\, then I'd like to suggest adding an explicit mention of this trap to the docs\, e.g. «perlsub».
I can't see any sane way to fix this without introducing weird special-cased behaviour\, e\,g. turning every bare $N in a function call's args into a "$N".
I suppose in places where $1 et al could get aliased (such as function calls and maybe foreach) a warning could be emitted\, but that might be noisy. I don't know whether there are valid use cases\, but grepping cpan shows 1400+ distributions matching foo($N\,...)\, although some of the foo's are things like subtr and index.
-- "Strange women lying in ponds distributing swords is no basis for a system of government. Supreme executive power derives from a mandate from the masses\, not from some farcical aquatic ceremony." -- Dennis\, "Monty Python and the Holy Grail"
The RT System itself - Status changed from 'new' to 'open'
On 12/28/17 3:40 AM\, Dave Mitchell via RT wrote:
On Sat\, Dec 23\, 2017 at 12:05:35PM -0800\, via RT wrote:
If $1 is passed as an arg to a function\, and that function internally performs a regex match\, then the argument seen from inside the func is corrupted I can't see any sane way to fix this without introducing weird special-cased behaviour\, e\,g. turning every bare $N in a function call's args into a "$N"
It sounds like there's likely nothing to do about it\, and making args into "$1" etc would copy data. I realize now that since the match object doesn't contain matched data\, there is no way to make $1 a real alias because there's nothing to alias it to.
But I can think of a non-trivial solution...
Replace references to $N\, %+ and related vars when they appear in sub/foreach args with a dynamically-created object which decorates the normal thingie with a check that the current match result is the same one which was current when the arg was created; if not\, it would throw an error "reference to no-longer-current match result". I suppose there might exist code which actually wants a $N passed as an arg to reference a to-be-created-in-the-future match result.
-Jim
On 28 Dec 2017 12:41\, "Dave Mitchell" \davem@​iabyn\.com wrote:
On Sat\, Dec 23\, 2017 at 12:05:35PM -0800\, via RT wrote:
If $1 is passed as an arg to a function\, and that function internally performs a regex match\, then the argument seen from inside the func is corrupted.
I'm guessing this is because the localization of $1 does not also localize aliases to $1 such as in @_. This is a nasty trap\, and it would be great if perl could at least diagnose it if it happens (the passed-in $1 is\, after all\, nominally read-only and a direct assignment results in a a fatal "Modification of a read-only value attempted"; so one could argue that any operation which similarly could modify that argument should be flagged as well).
$1 et al act like tied variables: whenever their value is retrieved\, they are set to a value from the current match. They are not scoped or localised\, but the match object is.
This should explain all the behaviour you see.
We'll\, that combined with the fact that perl is a pass by alias language.
The op seems to expect pass by value semantics which is simply a fundamental misunderstanding of how Perls @_ works.
If not fixable or catchable\, then I'd like to suggest adding an explicit mention of this trap to the docs\, e.g. «perlsub».
I can't see any sane way to fix this without introducing weird special-cased behaviour\, e\,g. turning every bare $N in a function call's args into a "$N".
I don't think there is anything to fix. Newcomers to perl encounter this at some point\, then learn not to do this\, either by copying regex vars early or by explicitly copying the vars as arguments by double quoting them. You will find this issue raised countless times on perlmonks.
I suppose in places where $1 et al could get aliased (such as function calls and maybe foreach) a warning could be emitted\, but that might be noisy. I don't know whether there are valid use cases\, but grepping cpan shows 1400+ distributions matching foo($N\,...)\, although some of the foo's are things like subtr and index.
Are we to do this for every tied object? How are we to know which are volatile?
A doc patch might be in order but imo no more\, vars like $! and $1 are volatile\, it is the programmers responsibility to copy them to non volatile storage or suffer the consequences.
Yves
On 12/29/17 1:07 AM\, yves orton via RT wrote:
the fact that perl is a pass by alias languag The op seems to expect pass by value semantics which is simply a fundamental misunderstanding of how Perls @_ works
Not exactly. Unlike almost anything else in Perl\, if $1 is passed as an argument\, it is not an alias to the caller's match result -- it is more like a _name_ which is effectively eval'd inside the sub each time it is referenced. If a parameter bound to $1 actually aliased the captured text\, it would still refer to that text after another match result was lexically pushed inside the sub.
Consider this analogy:
sub func($) { local $_ = "bar"; print "func called with $_[0]\n"; } $_ = "foo"; func($_);
No sane programmer would expect the function to print "bar"\, and it doesn't. The arg aliases $_\, but the alias points the _data_ not the name "$_".
But this:
sub func($) { "bar" =~ /(.*)/; print "func called with $_[0]\n"; } "foo" =~ /(.*)/; func($1);
does what no sane programmer would expect\, i.e.\, it prints "bar".
Now I think Dave Mitchell's idea of converting $1 to "$1" when passed as a sub arg might really be a Good Thing. In an ideal universe Perl might allow aliases which refer to a _substring_\, and then $1 could really refer directly to the captured text. But it can't\, so making a copy when passed as a sub arg is good enough to shield the sub's code from having to be aware of this issue. Bear in mind that $1 can't be used as an lvalue anyway\, so neither can $_[n] if it aliases $1; so pass-by-value in this case should be invisible.
On Fri\, 29 Dec 2017 13:35:53 -0800\, jim.avera@gmail.com wrote:
In an ideal universe Perl might allow aliases which refer to a _substring_\,
Perl already does that with the return value from substr(). This works:
$_ = "HELO"; for (substr $_\, 1\, 1) { $_ = "EL"; }
except that the special substr scalar does get its own copy of the substring internally\, which is unavoidable due to the requirement that string buffers end in a null.
and then $1 could really refer directly to the captured text.
I wondered for a moment why $1 could not be like a substr scalar\, but then I realized: you can modify the original string and $1 does not change.
However\, $1 currently *does* retrieve its string value dynamically from the pre-match copy\, which because of COW is usually the original string buffer. (But\, again\, because of null-termination\, $1 does get its own copy of the string buffer when you use it.)
But it can't\, so making a copy when passed as a sub arg is good enough to shield the sub's code from having to be aware of this issue. Bear in mind that $1 can't be used as an lvalue anyway\, so neither can $_[n] if it aliases $1; so pass-by-value in this case should be invisible.
It wouldn’t be invisible\, as referential identity would be lost. It might break a lot of introspection code.
--
Father Chrysostomos
On 29 December 2017 at 22:35\, Jim Avera \jim\.avera@​gmail\.com wrote:
On 12/29/17 1:07 AM\, yves orton via RT wrote:
the fact that perl is a pass by alias languag The op seems to expect pass by value semantics which is simply a fundamental misunderstanding of how Perls @_ works
Not exactly. Unlike almost anything else in Perl\, if $1 is passed as an argument\, it is not an alias to the caller's match result -- it is more like a _name_ which is effectively eval'd inside the sub each time it is referenced. If a parameter bound to $1 actually aliased the captured text\, it would still refer to that text after another match result was lexically pushed inside the sub.
Consider this analogy:
sub func($) { local $_ = "bar"; print "func called with $_[0]\n"; } $_ = "foo"; func($_);
No sane programmer would expect the function to print "bar"\, and it doesn't.
This is not the same thing. Local changes what SV an identifier resolves to\, it does not change the SV itself\, and it does not interfere with any refs or alias to other versions.
local $foo= "bar"; my $bar_ref= \$foo; local $foo= "baz"; print $$bar_ref;
prints out "bar" as I would expect. $_[0] in the case you showed is still an alias to whatever SV $_ was pointing at in the first place. Localization did not modify that var in any way.
Put another way\, after the local call there are *two* SV's in existence. With the regex case there is only one\, $1.
The arg aliases $_\, but the alias points the _data_ not the name "$_".
It points at the _container_ SV. _data_ implies that it is a value\, it is not\, it is a container.
It is not uncommon for even experienced Perl programmers to conflate values and containers when discussing scalars. 1 is a value. $x is a scalar container which may contain 1.
Aliasing occurs at the *container* level.
But this:
sub func($) { "bar" =~ /(.*)/; print "func called with $_[0]\n"; } "foo" =~ /(.*)/; func($1);
does what no sane programmer would expect\, i.e.\, it prints "bar".
It does what every experienced Perl programmer would expect.
$1 is a container which when used as an rvalue returns the value of the most recent successful match in scope.
func($1)
calls func and puts an alias to the container $1 into $_[0].
You could just as easily have said:
func("$1")
and created a copy. Or you could have written func() like this:
sub func { my $thing= shift; "bar" =~ /(.*)/; print "func called with $thing\n"; }
and created a copy inside the func instead.
This is a standard issue with aliasing. Because @_ contains aliases to the arguments\, it is potentially volatile\, and if this breaks your expectations then you should make a copy.
Here is an example which I consider to be exactly equivalent to the issue with $1 and @_\, but which uses no regex magic. IMO it is very clear that how it behaves is by design and that none of this is a bug\, no matter how surprising you might consider it to be:
sub othersub { $_[0]->{x}++ } sub whatever { my ($value\,$hashref)= @_; print "$value:$_[0]"; othersub($hashref); print "$value:$_[0]"; $_[0]+=2; } my %hash=(x=>1); whatever($hash{x}\,\%hash); print $hash{x};
which prints out:
1:1 1:2 4
Which to me is no different from the regex case.
So to me this thread is basically the result of mistaken assumptions about how aliasing works and how regex magic variables work. We can improve the docs to explain this stuff better\, but I strongly feel there is no bug here\, and the best we can do is improve how we educate people about the subtleties.
I mean\, a simple rule is:
Operating on @_ directly has subtle implications which may surprise the unwary or inexperienced. Copying the arguments as early as possible ensures that many of these traps are avoided\, and should be general practice. In particular the programmer should remember that any argument in @_ could be volatile\, and operations performed by the subroutine may result in the arguments changing value between the time of entry to the subroutine and the time of access of the variable. When in doubt copy early.
cheers\, Yves
-- perl -Mre=debug -e "/just|another|perl|hacker/"
On 12/30/17 4:25 AM\, yves orton via RT wrote:
The programmer should remember that any argument in @_ could be volatile\, and operations performed by the subroutine may result in the arguments changing value...
Thanks\, I understand what you are saying. But a programmer should only need to worry about "operations" which a) refer to sub arguments\, or b) use dynamic variables which have not been first first localized within the sub. I don't think application programmers should have to defend against weird tied vars or equivalent which break normal localization semantics.
The key point is that perlvar says "These variables are read-only and dynamically-scoped".
As you mentioned\, $1 is not a normal variable but "is a container which...returns the value of the most recent successful match in scope". And I think that is the crux of the problem: It does not behave like "dynamically scoped" variables elsewhere in Perl.
If $1 were implicitly localized in scopes containing a regex match\, but otherwise behaved normally\, then passing $1 to a sub would create an alias to an SV and inside the sub $1 would\, after being localized\, point to a different SV. There would be no trap.
In reality\, only the _data_ of match results is dynamically scoped\, magically\, behind the scenes. The variables used to get at that data are effectively crippled so that localization has no effect (even an explicit "local $1" does nothing).
perl can not localize $1 as long as captured text is not actually stored anywhere as such. That's efficient\, but feels like a semantic wart.
If this behavior isn't changed\, then\, perhaps the docs could be modified along these lines:
\
\
Migrated from rt.perl.org#132647 (status was 'open')
Searchable as RT132647$