Perl / perl5

đŸȘ The Perl programming language
https://dev.perl.org/perl5/
Other
1.97k stars 555 forks source link

Warning if regexp used as LHS of =~ instead of RHS #15622

Open p5pRT opened 8 years ago

p5pRT commented 8 years ago

Migrated from rt.perl.org#129333 (status was 'open')

Searchable as RT129333$

p5pRT commented 8 years ago

From @epa

Created by @epa

This program contains a mistake​:

  % perl -wE '$re = qr/a/; $s = "abc"; if ($re =~ $s) { say "hi" }'

The match operator is the wrong way round; it should be $s =~ $re. This is the kind of mistake that can be hard to see even if you stare at the code for a long time!

If warnings are enabled\, and the LHS of the =~ operator is a regular expression object (ref $foo eq 'Regexp')\, and the RHS is not a literal regexp or a regular expression object\, then a warning should be produced.

  use warnings;   my $re = qr/f/;   my $re2 = qr/g/;   my $s = 'foo';

  $s =~ $re; # OK   $re =~ $s; # warns   $re =~ /x/; # does not warn since RHS is a literal regexp   $re =~ $re2; # does not warn since RHS is a regexp object

The second and third cases above *could* warn\, but for the purpose of this bug report I have left them out since I want to narrow down the warning as much as possible and minimize false positives.

Perl Info ``` Flags: category=core severity=wishlist Site configuration information for perl 5.22.2: Configured by Red Hat, Inc. at Wed Aug 3 14:06:20 UTC 2016. Summary of my perl5 (revision 5 version 22 subversion 2) configuration: Platform: osname=linux, osvers=4.6.3-300.fc24.x86_64, archname=x86_64-linux-thread-multi uname='linux buildvm-08.phx2.fedoraproject.org 4.6.3-300.fc24.x86_64 #1 smp fri jun 24 20:52:41 utc 2016 x86_64 x86_64 x86_64 gnulinux ' config_args='-des -Doptimize=none -Dccflags=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Dldflags=-Wl,-z,relro -Dccdlflags=-Wl,--enable-new-dtags -Wl,-z,relro -Dlddlflags=-shared -Wl,-z,relro -Dshrpdir=/usr/lib64 -DDEBUGGING=-g -Dversion=5.22.2 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5 -Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin -Dusesitecustomize' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fwrapv -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize=' -g', cppflags='-D_REENTRANT -D_GNU_SOURCE -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fwrapv -fno-strict-aliasing -I/usr/local/include' ccversion='', gccversion='5.3.1 20160406 (Red Hat 5.3.1-6)', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678, doublekind=3 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16, longdblkind=3 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='gcc', ldflags ='-Wl,-z,relro -fstack-protector-strong -L/usr/local/lib' libpth=/usr/local/lib64 /lib64 /usr/lib64 /usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib libs=-lpthread -lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lpthread -lresolv -lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.22.so, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='2.22' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,--enable-new-dtags -Wl,-z,relro ' cccdlflags='-fPIC', lddlflags='-shared -Wl,-z,relro -L/usr/local/lib -fstack-protector-strong' Locally applied patches: Fedora Patch1: Removes date check, Fedora/RHEL specific Fedora Patch3: support for libdir64 Fedora Patch4: use libresolv instead of libbind Fedora Patch5: USE_MM_LD_RUN_PATH Fedora Patch6: Skip hostname tests, due to builders not being network capable Fedora Patch7: Dont run one io test due to random builder failures Fedora Patch15: Define SONAME for libperl.so Fedora Patch16: Install libperl.so to -Dshrpdir value Fedora Patch22: Document Math::BigInt::CalcEmu requires Math::BigInt (CPAN RT#85015) Fedora Patch26: Make *DBM_File desctructors thread-safe (RT#61912) Fedora Patch27: Make PadlistNAMES() lvalue again (CPAN RT#101063) Fedora Patch28: Make magic vtable writable as a work-around for Coro (CPAN RT#101063) Fedora Patch29: Fix duplicating PerlIO::encoding when spawning threads (RT#31923) Fedora Patch30: Do not let XSLoader load relative paths (CVE-2016-6185) Fedora Patch31: Avoid loading optional modules from default . (CVE-2016-1238) Fedora Patch200: Link XS modules to libperl.so with EU::CBuilder on Linux Fedora Patch201: Link XS modules to libperl.so with EU::MM on Linux @INC for perl 5.22.2: /home/eda/lib64/perl5/ /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 Environment for perl 5.22.2: HOME=/home/eda LANG=en_GB.UTF-8 LANGUAGE (unset) LC_COLLATE=C LC_CTYPE=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_NUMERIC=en_GB.UTF-8 LC_TIME=en_GB.UTF-8 LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/eda/bin:/home/eda/bin:/usr/local/bin:/usr/bin:/sbin:/usr/sbin:/sbin:/usr/sbin PERL5LIB=/home/eda/lib64/perl5/ PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 8 years ago

From @abigail

On Thu\, Sep 22\, 2016 at 08​:38​:55AM -0700\, Ed Avis wrote​:

This program contains a mistake​:

% perl \-wE '$re = qr/a/; $s = "abc"; if \($re =~ $s\) \{ say "hi" \}'

Who says it's a mistake? I've certainly written code which introspects patterns by matching a pattern against it.

And I don't always write /$pattern/ on the RHS of a =~ operator. After all\, Perl is there for the programmer; the programmer isn't there for Perl. Perl knows I mean a pattern\, because I write =~\, and will duely treat a string as a pattern.

Abigail

p5pRT commented 8 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 8 years ago

From @ap

* Abigail \abigail@​abigail\.be [2016-09-22 22​:00]​:

On Thu\, Sep 22\, 2016 at 08​:38​:55AM -0700\, Ed Avis wrote​:

This program contains a mistake​:

% perl \-wE '$re = qr/a/; $s = "abc"; if \($re =~ $s\) \{ say "hi" \}'

Who says it's a mistake? I've certainly written code which introspects patterns by matching a pattern against it.

I’ve done that\, but extremely rarely. I wonder if warning the user when they use a qr object on the LHS might be worthwhile. You can always and easily stringify it if that’s what you really want.

And I don't always write /$pattern/ on the RHS of a =~ operator.

strict.pm and warnings.pm themselves do that. :^) Courtesy of yours truly. :^P (Because constant-folding.)

Not gonna fly.

Regards\, -- Aristotle Pagaltzis // \<http​://plasmasturm.org/>

p5pRT commented 8 years ago

From @ap

* Aristotle Pagaltzis \pagaltzis@&#8203;gmx\.de [2016-09-26 06​:48]​:

I wonder if warning the user when they use a qr object on the LHS might be worthwhile. You can always and easily stringify it if that’s what you really want.

No.

Any time a value crosses an authority boundary\, like when a value is passed from user code to a CPAN module\, the code on the other side of the boundary would have to defensively stringify the LHS in every last pattern match against that value\, or else go to the effort to propagate the warning back to the boundary point.

Not gonna fly.

Not as part of perl proper. However​:

  package Regexp { use overload q[""] => sub { die } }

Looks like that does what you’d think it’ll do. And returning $_[0] doesn’t seem to cause recursion\, so getting a warning out of it is just as easy.

So anyone who wants as much can already have it.

Regards\, -- Aristotle Pagaltzis // \<http​://plasmasturm.org/>

p5pRT commented 8 years ago

From @epa

Aristotle Pagaltzis wrote​:

Any time a value crosses an authority boundary\, like when a value is passed from user code to a CPAN module\, the code on the other side of the boundary would have to defensively stringify the LHS in every last pattern match against that value\,

This is true. But the proposal was to warn only if the LHS of =~ is a regexp and the RHS is not. Take the following code

  sub match_and_report {   my ($string\, $regexp\, $message) = @​_;   if ($string =~ $regexp) { say "it matches\, $message" }   }

Now\, if this code is called the wrong way round\, transposing the string and regexp arguments\, isn't it reasonable for it to warn? Even if the warning is seen by the programmer as coming from inside the bowels of match_and_report\, rather than being a nicely packaged 'carp' of the caller's mistake. Even though the mistake is by the caller\, and there is no bug in match_and_report itself\, the warning is still helpful in practice to the programmer.

This is surely analogous to the warnings on uninitialized values. Every time a value crosses into a module's code\, it 'would have to defensively check against undef'. But in fact perl often gives 'Use of uninitialized value' coming from inside library code. The programmer then can peek inside the library to work out what's wrong. High-quality libraries will often have defensive usage checks for the most common errors\, but it's unlikely to be entirely free of warnings in all cases where the caller might pass undef. Again\, it's not a perfect diagnostic tool\, since the line number reported in the warning is a long way off where the undef was passed\, but nobody suggests perl shouldn't warn on undef values at run time for this reason.

I suggest that using =~ 'the wrong way round' is if anything more likely to be a mistake than doing something like undef()+5\, and is at least as worthy of a runtime warning\, bearing in mind that warnings can be turned on and off.

-- Ed Avis \eda@&#8203;waniasset\.com

This email is intended only for the person to whom it is addressed and may contain confidential information. Any retransmission\, copying\, disclosure or other use of\, this information by persons other than the intended recipient is prohibited. If you received this email in error\, please contact the sender and delete the material. This email is for information only and is not intended as an offer or solicitation for the purchase or sale of any financial instrument. Wadhwani Asset Management LLP is a Limited Liability Partnership registered in England (OC303168) with registered office at 9th Floor Orion House\, 5 Upper St Martin’s Lane\, London\, WC2H 9EA. It is authorised and regulated by the Financial Conduct Authority.