Open Sophira opened 2 years ago
I have been informed that using the -w
switch is a bad idea. I've retested the above with use warnings;
instead and can confirm that the issue is still as described in all cases.
Before I comment on the absence of a warning, I first want to understand why the pattern match is suceeding at all -- with or without warning.
When I run your program as a one-line, the output is match
:
$ perl -Mstrict -w -e 'my $str="test";if($str=~/@$/){print "match\n";}'
match
... which is not what I would intuitively expect. I would not have expected the pattern match to succeed; hence, I would have expected no output.
Can anyone explain?
Honestly, I agree with you. In fact, when I first presented this test case in the IRC channel, the comment read:
@ is not a metacharacter, this is not in double-quoted context, and @$ is not a special perl variable anyway. Why does this match, and why is this not caught by 'use strict'?
The people in the channel convinced me that I was wrong and that this was a context in which Perl would try to interpolate variables, but that the lack of a warning was the bug. So I went ahead and opened this issue.
Honestly though, when I encountered this behaviour with no warning, whether it was supposed to happen or not, it was baffling to me. The context is that I was trying to detect hunk header lines in unified diff output, and these begin and end with @@
. So of course I had @@$
at the end of my regex, which is what led to this issue.
I too think that Perl probably shouldn't be trying to interpolate lists here, but I figured the people in the channel knew best!
Oh, and as for an actual explanation of why it matches, what I assume is happening is that Perl is interpolating the (empty, since it's non-existent) list @$
in the regex, which means the regex resolves to an empty regex: if($str=~//)
. This matches (because it technically doesn't need to have anything at all to be a match) and so it prints match
.
@$
is an array, not a list, and it exists because the superglobal $$
exists. So this is correct interpolation of a currently-empty array.
Thank you for explaining! I'll take you at your word for the one, since I don't know how Perl internals really work and I'm still fuzzy on the difference between an array and a list in Perl. I still feel it's unintuitive but this isn't the right place for me to argue that so I'll drop it.
The lack of a warning is still an issue, though, it sounds like?
The lack of a warning is still an issue, though, it sounds like?
My hunch is that the cost in Perl core developer time to implement this warning and to maintain it going forward would exceed the benefit we derive from it.
Warnings are implemented deep in the Perl guts. The number of people who can do that correctly is small, so we/they have to prioritize.
That's true for all warnings, but for this one in particular there are additional considerations. We have two existing Possible unintended interpolation
warnings, as documented in pod/perldiag.pod
:
Possible unintended interpolation of $\ in regex
(W ambiguous) You said something like "m/$\/" in a regex.
...
Possible unintended interpolation of %s in string
(W ambiguous) You said something like '@foo' in a
double-quoted string but there was no array @foo in scope at
the time. ...
You suggested that this case should be covered by the second warning above, but it appears that we're in regex
rather than (or more than) in string
. Hence, this would likely be creation of a new warning in the ambiguous
category rather than an extension of the existing in string
category. The effort would almost certainly be non-trivial, and IMO we would want to have evidence that many people were making this mistake before we would undertake that effort.
That's fair. I didn't realise that there was a separate regex warning - I actually copied and pasted the warning from what used to be emitted by the test case when using v5.25.3 and earlier. I can see that that was a case of an incorrect warning, now, because yes, we're clearly in a regex here. (That sounds like it may be a separate bug which may bear investigation depending on how it works and if there are any consequences beyond the incorrect warnings, though, if it hasn't already been fixed.)
I will point out that there is also no warning given for when the variable is interpolated into strings either (as in my $str = "@$";
), but it sounds like the amount of effort required is a lot more than I understood it to be.
I do, however, want to point out that you are less likely to get reports for this particular use case because it's less likely that there will be undesired behaviour (even if it's incorrect behaviour). For example, if like me you wanted to detect hunk headers in unified diffs and had a regex looking something like this (regex simplified to make things easier):
if ($line=~/^@@.*@@$/) {
This regex does in fact match the two @ characters at the start of the line successfully, so it seems that @@
itself is not interpolated. At the end, however, the @$
is interpolated into nothing, so the regex becomes:
if ($line=~/^@@.*@/) {
This will match a hunk header line just fine - and in fact it also detects hunk header lines where the function context is given, which is the case in many git logs and was a case that my original regex would not have detected if it had worked as written, such as in this line:
@@ -390,18 +390,10 @@ ThreadInterface :: ~ThreadInterface()
I only realised this was a problem because I noticed it detecting these lines and was at first pleased, and then I realised that actually, even though this was a good thing, the regex as written shouldn't have detected those. This is what led to this issue.
Basically, what I'm trying to say is that there may be cases where this exact problem is being masked by the fact that it works anyway for what people want it to do. The failure in this case was a poorly written regex for what was intended but which worked (to some extent) better than written (by successfully matching all hunk header lines). As such, it may not be recognised as a failure and thus reports for this use case might be limited.
I do, however, want to point out that you are less likely to get reports for this particular use case because it's less likely that there will be undesired behaviour (even if it's incorrect behaviour). For example, if like me you wanted to detect hunk headers in unified diffs and had a regex looking something like this (regex simplified to make things easier):
if ($line=~/^@@.*@@$/) {
This regex does in fact match the two @ characters at the start of the line successfully, so it seems that
@@
itself is not interpolated. At the end, however, the@$
is interpolated into nothing, so the regex becomes:
if ($line=~/^@@.*@/) {
This will match a hunk header line just fine - and in fact it also detects hunk header lines where the function context is given, which is the case in many git logs and was a case that my original regex would not have detected if it had worked as written, such as in this line:
@@ -390,18 +390,10 @@ ThreadInterface :: ~ThreadInterface()
This case actually appears to be weirder. It's interpolating @.
(since similarly, $.
exists) and so your regex ends up being ^@*@
, which matches your initial two @
characters and doesn't care what follows.
I definitely agree that the warning should be firing for these cases.
But as @jkeenan points out, the two existing warnings don't cover this case. One is a string warning for arrays which aren't in scope (superglobals are always in scope), and the other is a regex warning specifically for $\
. So to cover these cases the latter warning would have to be expanded in scope, or another one added.
This case actually appears to be weirder. It's interpolating
@.
(since similarly,$.
exists) and so your regex ends up being^@*@
, which matches your initial two@
characters and doesn't care what follows.
Wow. So in simplifying the regex (the initial regex I was actually using was much longer - $line=~/^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@$/
- but I didn't consider that the details would be important; how wrong I was!), I guess I unwittingly created a new instance that I never even considered! I guess this shows the importance of always being exact in bug reports, particularly when the issue you're reporting is like this one.
I did (lightly) test the regex before I put it in the bug, of course, but I think this goes to show how unexpected this whole thing can be.
However @-
and @+
don't interpolate.
perl -wle 'qq[01\@\@45] =~ m/@?/ and print "[$&,@-,@+]" '
[,0,0]
perl -wle 'qq[01\@\@45] =~ m/@*/ and print "[$&,@-,@+]" '
[,0,0]
perl -wle 'qq[01\@\@45] =~ m/@+/ and print "[$&,@-,@+]" '
[@@,2,4]
perl -wle 'qq[01\@\@45] =~ m/@-/ and print "[$&,@-,@+]" '
no match
perl -wle 'qq[01\@\@45] =~ m/@4/ and print "[$&,@-,@+]"'
[,0,0]
Need to use curly brackets to interpolate @{+}
, @{-}
:
perl -wle 'qq[01\@\@45] =~ m/@{ + }/ and print "[$&,@-,@+]"'
[,0,0]
@{n}
and @{n,m}
interpolate as variable names not as quantifier.
perl -wle 'qq[01\@\@45] =~ m/@{1,2}/ and print "[$&,@-,@+]"'
[,0,0]
Module: n/a
Description When unintentionally interpolating something that looks like a special Perl variable (but in fact doesn't exist as such, such as
@$
), no warning of possible unintended interpolation occurs.With help from IRC (and especially xq - thank you!), I was able to bisect this down to commit 10030f4 (between Perl versions v5.25.3 and v5.25.4), which seems to be mitigating for Perl variables that don't exist when they're not needed, but which Perl pretends exist when fetched. This is not the case for non-existent variables like
@$
and can lead to user confusion, such as in the reproduction script below.Steps to Reproduce
Expected behavior
Perl should issue a warning similar to the following:
Possible unintended interpolation of @$ in string at ../perltest-2 line 6.
Instead, Perl doesn't issue any warning.
Perl configuration
While the bug occurs on both my regular Perl install (the v5.34.0 Gentoo ebuild) and blead, I'll only include the output from blead here. If my regular configuration is wanted, please let me know.