Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.93k stars 551 forks source link

'Possible unintended interpolation' warning does not occur when it should #19286

Open Sophira opened 2 years ago

Sophira commented 2 years ago

Module: n/a

Description When unintentionally interpolating something that looks like a special Perl variable (but in fact doesn't exist as such, such as @$), no warning of possible unintended interpolation occurs.

With help from IRC (and especially xq - thank you!), I was able to bisect this down to commit 10030f4 (between Perl versions v5.25.3 and v5.25.4), which seems to be mitigating for Perl variables that don't exist when they're not needed, but which Perl pretends exist when fetched. This is not the case for non-existent variables like @$ and can lead to user confusion, such as in the reproduction script below.

Steps to Reproduce

#!/usr/bin/perl -w
use strict;

my $str = "test";

if ($str=~/@$/) {   # user is attempting to look for a literal @ symbol at
                    # the end of $str, but Perl instead tries to interpolate
                    # the non-existent variable @$ with no warning
  print "match\n";
}

Expected behavior

Perl should issue a warning similar to the following:

Possible unintended interpolation of @$ in string at ../perltest-2 line 6.

Instead, Perl doesn't issue any warning.

Perl configuration

While the bug occurs on both my regular Perl install (the v5.34.0 Gentoo ebuild) and blead, I'll only include the output from blead here. If my regular configuration is wanted, please let me know.

Summary of my perl5 (revision 5 version 35 subversion 7) configuration:
  Commit id: a3aaec921c9b50eb220092214145c88a5ccc7980
  Platform:
    osname=linux
    osvers=5.4.143-gentoo
    archname=x86_64-linux
    uname='linux home 5.4.143-gentoo #3 smp preempt wed nov 3 09:33:33 gmt 2021 x86_64 intel(r) core(tm) i7-5820k cpu @ 3.30ghz genuineintel gnulinux '
    config_args='-de -Dusedevel -Dcc=gcc-8.4.0'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='gcc-8.4.0'
    ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-O2'
    cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='8.4.0'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='gcc-8.4.0'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib /lib64 /usr/lib64 /lib /usr/local/lib64
    libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.33.so
    so=so
    useshrplib=false
    libperl=libperl.a
    gnulibc_version='2.33'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'

Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_TIMES
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_DEVEL
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
  Built under linux
  Compiled at Dec 17 2021 10:51:16
  %ENV:
    PERL5LIB="/home/sophie/git/perl5/lib"
  @INC:
    /home/sophie/git/perl5/lib
    /usr/local/lib/perl5/site_perl/5.35.7/x86_64-linux
    /usr/local/lib/perl5/site_perl/5.35.7
    /usr/local/lib/perl5/5.35.7/x86_64-linux
    /usr/local/lib/perl5/5.35.7
Sophira commented 2 years ago

I have been informed that using the -w switch is a bad idea. I've retested the above with use warnings; instead and can confirm that the issue is still as described in all cases.

jkeenan commented 2 years ago

Before I comment on the absence of a warning, I first want to understand why the pattern match is suceeding at all -- with or without warning.

When I run your program as a one-line, the output is match:

$ perl -Mstrict -w -e 'my $str="test";if($str=~/@$/){print "match\n";}'
match

... which is not what I would intuitively expect. I would not have expected the pattern match to succeed; hence, I would have expected no output.

Can anyone explain?

Sophira commented 2 years ago

Honestly, I agree with you. In fact, when I first presented this test case in the IRC channel, the comment read:

@ is not a metacharacter, this is not in double-quoted context, and @$ is not a special perl variable anyway. Why does this match, and why is this not caught by 'use strict'?

The people in the channel convinced me that I was wrong and that this was a context in which Perl would try to interpolate variables, but that the lack of a warning was the bug. So I went ahead and opened this issue.

Honestly though, when I encountered this behaviour with no warning, whether it was supposed to happen or not, it was baffling to me. The context is that I was trying to detect hunk header lines in unified diff output, and these begin and end with @@. So of course I had @@$ at the end of my regex, which is what led to this issue.

I too think that Perl probably shouldn't be trying to interpolate lists here, but I figured the people in the channel knew best!

Sophira commented 2 years ago

Oh, and as for an actual explanation of why it matches, what I assume is happening is that Perl is interpolating the (empty, since it's non-existent) list @$ in the regex, which means the regex resolves to an empty regex: if($str=~//). This matches (because it technically doesn't need to have anything at all to be a match) and so it prints match.

Grinnz commented 2 years ago

@$ is an array, not a list, and it exists because the superglobal $$ exists. So this is correct interpolation of a currently-empty array.

Sophira commented 2 years ago

Thank you for explaining! I'll take you at your word for the one, since I don't know how Perl internals really work and I'm still fuzzy on the difference between an array and a list in Perl. I still feel it's unintuitive but this isn't the right place for me to argue that so I'll drop it.

The lack of a warning is still an issue, though, it sounds like?

jkeenan commented 2 years ago

The lack of a warning is still an issue, though, it sounds like?

My hunch is that the cost in Perl core developer time to implement this warning and to maintain it going forward would exceed the benefit we derive from it.

Warnings are implemented deep in the Perl guts. The number of people who can do that correctly is small, so we/they have to prioritize.

That's true for all warnings, but for this one in particular there are additional considerations. We have two existing Possible unintended interpolation warnings, as documented in pod/perldiag.pod:

Possible unintended interpolation of $\ in regex
    (W ambiguous) You said something like "m/$\/" in a regex.
...

Possible unintended interpolation of %s in string
    (W ambiguous) You said something like '@foo' in a
    double-quoted string but there was no array @foo in scope at
    the time. ...

You suggested that this case should be covered by the second warning above, but it appears that we're in regex rather than (or more than) in string. Hence, this would likely be creation of a new warning in the ambiguous category rather than an extension of the existing in string category. The effort would almost certainly be non-trivial, and IMO we would want to have evidence that many people were making this mistake before we would undertake that effort.

Sophira commented 2 years ago

That's fair. I didn't realise that there was a separate regex warning - I actually copied and pasted the warning from what used to be emitted by the test case when using v5.25.3 and earlier. I can see that that was a case of an incorrect warning, now, because yes, we're clearly in a regex here. (That sounds like it may be a separate bug which may bear investigation depending on how it works and if there are any consequences beyond the incorrect warnings, though, if it hasn't already been fixed.)

I will point out that there is also no warning given for when the variable is interpolated into strings either (as in my $str = "@$";), but it sounds like the amount of effort required is a lot more than I understood it to be.

I do, however, want to point out that you are less likely to get reports for this particular use case because it's less likely that there will be undesired behaviour (even if it's incorrect behaviour). For example, if like me you wanted to detect hunk headers in unified diffs and had a regex looking something like this (regex simplified to make things easier):

if ($line=~/^@@.*@@$/) {

This regex does in fact match the two @ characters at the start of the line successfully, so it seems that @@ itself is not interpolated. At the end, however, the @$ is interpolated into nothing, so the regex becomes:

if ($line=~/^@@.*@/) {

This will match a hunk header line just fine - and in fact it also detects hunk header lines where the function context is given, which is the case in many git logs and was a case that my original regex would not have detected if it had worked as written, such as in this line:

@@ -390,18 +390,10 @@ ThreadInterface :: ~ThreadInterface()

I only realised this was a problem because I noticed it detecting these lines and was at first pleased, and then I realised that actually, even though this was a good thing, the regex as written shouldn't have detected those. This is what led to this issue.

Basically, what I'm trying to say is that there may be cases where this exact problem is being masked by the fact that it works anyway for what people want it to do. The failure in this case was a poorly written regex for what was intended but which worked (to some extent) better than written (by successfully matching all hunk header lines). As such, it may not be recognised as a failure and thus reports for this use case might be limited.

Grinnz commented 2 years ago

I do, however, want to point out that you are less likely to get reports for this particular use case because it's less likely that there will be undesired behaviour (even if it's incorrect behaviour). For example, if like me you wanted to detect hunk headers in unified diffs and had a regex looking something like this (regex simplified to make things easier):

if ($line=~/^@@.*@@$/) {

This regex does in fact match the two @ characters at the start of the line successfully, so it seems that @@ itself is not interpolated. At the end, however, the @$ is interpolated into nothing, so the regex becomes:

if ($line=~/^@@.*@/) {

This will match a hunk header line just fine - and in fact it also detects hunk header lines where the function context is given, which is the case in many git logs and was a case that my original regex would not have detected if it had worked as written, such as in this line:

@@ -390,18 +390,10 @@ ThreadInterface :: ~ThreadInterface()

This case actually appears to be weirder. It's interpolating @. (since similarly, $. exists) and so your regex ends up being ^@*@, which matches your initial two @ characters and doesn't care what follows.

I definitely agree that the warning should be firing for these cases.

Grinnz commented 2 years ago

But as @jkeenan points out, the two existing warnings don't cover this case. One is a string warning for arrays which aren't in scope (superglobals are always in scope), and the other is a regex warning specifically for $\. So to cover these cases the latter warning would have to be expanded in scope, or another one added.

Sophira commented 2 years ago

This case actually appears to be weirder. It's interpolating @. (since similarly, $. exists) and so your regex ends up being ^@*@, which matches your initial two @ characters and doesn't care what follows.

Wow. So in simplifying the regex (the initial regex I was actually using was much longer - $line=~/^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@$/ - but I didn't consider that the details would be important; how wrong I was!), I guess I unwittingly created a new instance that I never even considered! I guess this shows the importance of always being exact in bug reports, particularly when the issue you're reporting is like this one.

I did (lightly) test the regex before I put it in the bug, of course, but I think this goes to show how unexpected this whole thing can be.

rsFalse commented 2 years ago

However @- and @+ don't interpolate.

perl -wle 'qq[01\@\@45] =~ m/@?/ and print "[$&,@-,@+]" '
[,0,0]
perl -wle 'qq[01\@\@45] =~ m/@*/ and print "[$&,@-,@+]" '
[,0,0]
perl -wle 'qq[01\@\@45] =~ m/@+/ and print "[$&,@-,@+]" '
[@@,2,4]
perl -wle 'qq[01\@\@45] =~ m/@-/ and print "[$&,@-,@+]" '
no match
perl -wle 'qq[01\@\@45] =~ m/@4/ and print "[$&,@-,@+]"'
[,0,0]

Need to use curly brackets to interpolate @{+}, @{-}:

perl -wle 'qq[01\@\@45] =~ m/@{ + }/ and print "[$&,@-,@+]"'
[,0,0]

@{n} and @{n,m} interpolate as variable names not as quantifier.

perl -wle 'qq[01\@\@45] =~ m/@{1,2}/ and print "[$&,@-,@+]"'
[,0,0]